DBRX

Last updated
DBRX
Developer(s) Mosaic ML and Databricks team
Initial releaseMarch 27, 2024
Repository https://github.com/databricks/dbrx
License Databricks Open License
Website https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm

DBRX is an open-sourced large language model (LLM) developed by Mosaic ML team at Databricks, released on March 27, 2024. [1] [2] [3] It is a mixture-of-experts Transformer model, with 132 billion parameters in total. 36 billion parameters (or 4 active experts) are active for each token. [4] The released model comes in either a base foundation model version or an instruct-tuned variant. [5]

DRBX outperforms other prominent open-source models such as Meta's LLaMA 2, Mistral AI's Mixtral, and xAI's Grok and close-sourced models such as GPT-3.5 in several benchmarks ranging from language understanding, programming ability and mathematics. [4] [6] [7] As of March 28, 2024, this makes DBRX the world's most powerful open sourced model. [8]

It was trained in 2.5 months [8] on 3,072 Nvidia H100s connected by 3.2 terabytes per second bandwidth (InfiniBand), for a training cost of $10m USD. [1]

Related Research Articles

A language model is a probabilistic model of a natural language. In 1980, the first significant statistical language model was proposed, and during the decade IBM performed ‘Shannon-style’ experiments, in which potential sources for language modeling improvement were identified by observing and analyzing the performance of human subjects in predicting or correcting text.

Anthropic PBC is an American artificial intelligence (AI) startup company, founded by former members of OpenAI. Anthropic has developed a family of large language models named Claude.

iFlytek, styled as iFLYTEK, is a partially state-owned Chinese information technology company established in 1999. It creates voice recognition software and 10+ voice-based internet/mobile products covering education, communication, music, intelligent toys industries. State-owned enterprise China Mobile is the company's largest shareholder. The company is listed in the Shenzhen Stock Exchange with market capitalization at 25 billion RMB and it is backed by several state-owned investment funds.

<span class="mw-page-title-main">Databricks</span> American software company

Databricks, Inc. is a global data, analytics and artificial intelligence company founded by the original creators of Apache Spark.

<span class="mw-page-title-main">OpenAI</span> Artificial intelligence research organization

OpenAI is a U.S. based artificial intelligence (AI) research organization founded in December 2015, researching artificial intelligence with the goal of developing "safe and beneficial" artificial general intelligence, which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As one of the leading organizations of the AI spring, it has developed several large language models, advanced image generation models, and previously, released open-source models. Its release of ChatGPT has been credited with starting the AI spring.

Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. Like its predecessor, GPT-2, it is a decoder-only transformer model of deep neural network, which supersedes recurrence and convolution-based architectures with a technique known as "attention". This attention mechanism allows the model to selectively focus on segments of input text it predicts to be most relevant. It uses a 2048-tokens-long context, float16 (16-bit) precision, and a hitherto-unprecedented 175 billion parameters, requiring 350GB of storage space as each parameter takes 2 bytes of space, and has demonstrated strong "zero-shot" and "few-shot" learning abilities on many tasks.

<span class="mw-page-title-main">GPT-2</span> 2019 text-generating language model

Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.

Wu Dao is a multimodal artificial intelligence developed by the Beijing Academy of Artificial Intelligence (BAAI). Wu Dao 1.0 was first announced on January 11, 2021; an improved version, Wu Dao 2.0, was announced on May 31. It has been compared to GPT-3, and is built on a similar architecture; in comparison, GPT-3 has 175 billion parameters — variables and inputs within the machine learning model — while Wu Dao has 1.75 trillion parameters. Wu Dao was trained on 4.9 terabytes of images and texts, while GPT-3 was trained on 45 terabytes of text data. Yet, a growing body of work highlights the importance of increasing both data and parameters. The chairman of BAAI said that Wu Dao was an attempt to "create the biggest, most powerful AI model possible"; although direct comparisons between models based on parameter count do not directly correlate to quality. Wu Dao 2.0, was called "the biggest language A.I. system yet". It was interpreted by commenters as an attempt to "compete with the United States".. Notably, the type of architecture used for Wu Dao 2.0 is a mixture-of-experts (MoE) model, unlike GPT-3, which is a "dense" model: while MoE models require much less computational power to train than dense models with the same numbers of parameters, trillion-parameter MoE models have shown comparable performance to models that are hundreds of times smaller.

Prompt engineering is the process of structuring text that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform.

Hugging Face, Inc. is an French-American company based in New York City that develops computer tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent framework for generative artificial intelligence. They are artificial neural networks that are used in natural language processing tasks. GPTs are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.

<span class="mw-page-title-main">GPT-J</span> Open source artificial intelligence text generating language model developed by EleutherAI

GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.

<span class="mw-page-title-main">EleutherAI</span> Artificial intelligence research collective

EleutherAI is a grass-roots non-profit artificial intelligence (AI) research group. The group, considered an open-source version of OpenAI, was formed in a Discord server in July 2020 to organize a replication of GPT-3. In early 2023, it formally incorporated as the EleutherAI Foundation, a non-profit research institute.

A large language model (LLM) is a language model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process. LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word.

LLaMA is a family of autoregressive large language models (LLMs), released by Meta AI starting in February 2023.

Open-source artificial intelligence is the application of open-source practices to the development of artificial intelligence resources.

<span class="mw-page-title-main">Gemini (language model)</span> Large language model developed by Google

Gemini is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. Comprising Gemini Ultra, Gemini Pro, and Gemini Nano, it was announced on December 6, 2023, positioned as a competitor to OpenAI's GPT-4. It powers the generative artificial intelligence chatbot of the same name.

<span class="mw-page-title-main">Grok (chatbot)</span> Chatbot developed by xAI

Grok is a generative artificial intelligence chatbot developed by xAI, based on a large language model (LLM). It was developed as an initiative by Elon Musk as a direct response to the rise of OpenAI's ChatGPT which Musk co-founded. The chatbot is advertised as "having a sense of humor" and direct access to Twitter (X). It is currently under beta testing for those with the premium version of X.

Mistral AI is a French company selling artificial intelligence (AI) products. It was founded in April 2023 by previous employees of Meta Platforms and Google DeepMind. The company raised €385 million in October 2023 and in December 2023 it was valued at more than $2 billion.

Jamba is an open-weights large language model (LLM) developed by AI21 Labs. It utilizes a Mamba-based model built on a novel state space model (SSM) and transformer hybrid architecture. It is a 52 billion parameter model trained using a mixture-of-experts (MoE) technique with 12B active parameters. Jamba can fit up to 256K tokens in its context window and is the largest Mamba-variant LLM created, or 140k tokens in a single 80GB GPU

References

  1. 1 2 "Introducing DBRX: A New State-of-the-Art Open LLM". Databricks. 2024-03-27. Retrieved 2024-03-28.
  2. "New Databricks open source LLM targets custom development | TechTarget". Business Analytics. Retrieved 2024-03-28.
  3. Ghoshal, Anirban (2024-03-27). "Databricks' open-source DBRX LLM beats Llama 2, Mixtral, and Grok". InfoWorld. Retrieved 2024-03-28.
  4. 1 2 "A New Open Source LLM, DBRX Claims to be the Most Powerful – Here are the Scores". GIZMOCHINA. Mar 28, 2024.
  5. Wiggers, Kyle (2024-03-27). "Databricks spent $10M on new DBRX generative AI model". TechCrunch. Retrieved 2024-03-29.
  6. "Databricks releases DBRX: open-source LLM that beats GPT-3.5 and Llama 2". Techzine Europe. 2024-03-27. Retrieved 2024-03-28.
  7. "Data and AI company DataBrix has launched a general-purpose large language model (LLM) DBRX that out.. - MK". 매일경제. 2024-03-28. Retrieved 2024-03-28.
  8. 1 2 Knight, Will. "Inside the Creation of the World's Most Powerful Open Source AI Model". Wired. ISSN   1059-1028 . Retrieved 2024-03-28.