DBRX

Last updated
DBRX
Developer(s) Mosaic ML and Databricks team
Initial releaseMarch 27, 2024
Repository https://github.com/databricks/dbrx
License Databricks Open License
Website https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm

DBRX is an open-sourced large language model (LLM) developed by Mosaic ML team at Databricks, released on March 27, 2024. [1] [2] [3] It is a mixture-of-experts transformer model, with 132 billion parameters in total. 36 billion parameters (4 out of 16 experts) are active for each token. [4] The released model comes in either a base foundation model version or an instruction-tuned variant. [5]

At the time of its release, DRBX outperformed other prominent open-source models such as Meta's LLaMA 2, Mistral AI's Mixtral, and xAI's Grok, in several benchmarks ranging from language understanding, programming ability and mathematics. [4] [6] [7]

It was trained for 2.5 months [7] on 3,072 Nvidia H100s connected by 3.2 terabytes per second bandwidth (InfiniBand), for a training cost of $10m USD. [1]

Related Research Articles

iFlytek Chinese technology company

iFlytek, styled as iFLYTEK, is a partially state-owned Chinese information technology company established in 1999. It creates voice recognition software and 10+ voice-based internet/mobile products covering education, communication, music, intelligent toys industries. State-owned enterprise China Mobile is the company's largest shareholder. The company is listed in the Shenzhen Stock Exchange and it is backed by several state-owned investment funds.

<span class="mw-page-title-main">Databricks</span> American software company

Databricks, Inc. is a global data, analytics, and artificial intelligence (AI) company, founded in 2013 by the original creators of Apache Spark. The company provides a cloud-based platform to help enterprises build, scale, and govern data and AI, including generative AI and other machine learning models.

Alteryx, Inc. is an American computer software company based in Irvine, California, with a development center in Broomfield, Colorado, and offices worldwide. The company's products are used for data science and analytics.

OpenAI is an American artificial intelligence (AI) research organization founded in December 2015 and headquartered in San Francisco, California. Its stated mission is to develop "safe and beneficial" artificial general intelligence (AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As a leading organization in the ongoing AI boom, OpenAI is known for the GPT family of large language models, the DALL-E series of text-to-image models, and a text-to-video model named Sora. Its release of ChatGPT in November 2022 has been credited with catalyzing widespread interest in generative AI.

Hugging Face, Inc. is an American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an artificial neural network that is used in natural language processing by machines. It is based on the transformer deep learning architecture, pre-trained on large data sets of unlabeled text, and able to generate novel human-like content. As of 2023, most LLMs had these characteristics and are sometimes referred to broadly as GPTs.

<span class="mw-page-title-main">GPT-J</span> Open source artificial intelligence text generating language model developed by EleutherAI

GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.

A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.

<span class="mw-page-title-main">Llama (language model)</span> Large language model by Meta AI

Llama is a family of autoregressive large language models (LLMs) released by Meta AI starting in February 2023. The latest version is Llama 3.3, released in December 2024.

xAI (company) Artificial Intelligence focused startup

X.AI Corp., doing business as xAI, is an American startup company working in the area of artificial intelligence (AI). Founded by Elon Musk in March 2023, its stated goal is "to understand the true nature of the universe".

<span class="mw-page-title-main">Gemini (language model)</span> Large language model developed by Google

Gemini is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. Comprising Gemini Ultra, Gemini Pro, Gemini Flash, and Gemini Nano, it was announced on December 6, 2023, positioned as a competitor to OpenAI's GPT-4. It powers the chatbot of the same name.

<span class="mw-page-title-main">Grok (chatbot)</span> Chatbot developed by xAI

Grok is a generative artificial intelligence chatbot developed by xAI. Based on the large language model (LLM) of the same name, it was launched in 2023 as an initiative by Elon Musk. The chatbot is advertised as having a "sense of humor" and direct access to X. It is currently under beta testing.

<span class="mw-page-title-main">Mistral AI</span> French artificial intelligence company

Mistral AI, headquartered in Paris, France specializes in artificial intelligence (AI) products and focuses on open-weight large language models, (LLMs). Founded in April 2023 by former engineers from Google DeepMind and Meta Platforms, the company has gained prominence as an alternative to proprietary AI systems. Named after the mistral – a powerful, cold wind in southern France – the company emphasized openness and innovation in the AI field. Mistral AI positions itself as an alternative to proprietary models.

<span class="mw-page-title-main">Perplexity AI</span> AI search engine

Perplexity AI is a conversational search engine that uses large language models (LLMs) to answer queries. Its developer, Perplexity AI, Inc., is based in San Francisco, California.

Mamba is a deep learning architecture focused on sequence modeling. It was developed by researchers from Carnegie Mellon University and Princeton University to address some limitations of transformer models, especially in processing long sequences. It is based on the Structured State Space sequence (S4) model.

Huawei PanGu, PanGu, PanGu-Σ or PanGu-π is a multimodal large language model developed by Huawei. It was announced on July 7, 2023, positioned as a contender to other multimodal large language models.

<span class="mw-page-title-main">IBM Granite</span> 2023 text-generating language model

IBM Granite is a series of decoder-only AI foundation models created by IBM. It was announced on September 7, 2023, and an initial paper was published 4 days later. Initially intended for use in the IBM's cloud-based data and generative AI platform Watsonx along with other models, IBM opened the source code of some code models. Granite models are trained on datasets curated from Internet, academic publishings, code datasets, legal and finance documents.

llama.cpp Software library for LLM inference

llama.cpp is an open source software library that performs inference on various large language models such as Llama. It is co-developed alongside the GGML project, a general-purpose tensor library.

<span class="mw-page-title-main">01.AI</span> Artificial intelligence company

01.AI is an artificial intelligence (AI) company based in Beijing, China. It focuses on developing open source products.

References

  1. 1 2 "Introducing DBRX: A New State-of-the-Art Open LLM". Databricks. 2024-03-27. Retrieved 2024-03-28.
  2. "New Databricks open source LLM targets custom development | TechTarget". Business Analytics. Retrieved 2024-03-28.
  3. Ghoshal, Anirban (2024-03-27). "Databricks' open-source DBRX LLM beats Llama 2, Mixtral, and Grok". InfoWorld. Retrieved 2024-03-28.
  4. 1 2 "A New Open Source LLM, DBRX Claims to be the Most Powerful – Here are the Scores". GIZMOCHINA. Mar 28, 2024.
  5. Wiggers, Kyle (2024-03-27). "Databricks spent $10M on new DBRX generative AI model". TechCrunch. Retrieved 2024-03-29.
  6. "Data and AI company DataBrix has launched a general-purpose large language model (LLM) DBRX that out." Maeil Business Newspaper . 2024-03-28. Retrieved 2024-03-28.
  7. 1 2 Knight, Will. "Inside the Creation of the World's Most Powerful Open Source AI Model". Wired. ISSN   1059-1028 . Retrieved 2024-03-28.