IBM Granite

Last updated
Granite
Developer(s) IBM Research [1]
Initial releaseNovember 7, 2023;6 months ago (2023-11-07)
Platform IBM Watsonx (initially)
GitHub
Hugging Face
RHEL AI
Type
License Proprietary
Code models: Open Source (Apache 2.0) [2]

IBM Granite is a series of decoder-only foundation models created by IBM. It was announced on September 7, 2023, [3] [4] and an initial paper was published 4 days later. [5] Initially intended for use in the IBM's cloud-based data and generative AI platform Watsonx along with other models, [6] IBM opened the source code of some code models. [7] Granite models are trained on datasets curated from Internet, academic publishings, code datasets, legal and finance documents. [8] [9] [1]

Contents

Foundation models

A foundation model is an AI model trained on broad data at scale such that it can be adapted to a wide range of downstream tasks. [10]

Granite's first foundation models were Granite.13b.instruct and Granite.13b.chat. The "13b" in their name comes from 13 billion, the amount of parameters they have as models, lesser than most of the larger models of the time. Later models vary from 3 to 34 billion parameters. [3] [11]

On May 6, 2024, IBM released the source code of four variations of Granite Code Models under Apache 2, an open source permissive license that allows completely free use, modification and sharing of the software, and put them on Hugging Face for public use. [12] [13] According to IBM's own report, Granite 8b outperforms Llama 3 on several coding related tasks within similar range of parameters. [14] [15]

See also

Related Research Articles

<span class="mw-page-title-main">Databricks</span> American software company

Databricks, Inc. is a global data, analytics and artificial intelligence company founded by the original creators of Apache Spark.

<span class="mw-page-title-main">Writer Inc.</span>

Writer is a generative artificial intelligence company based in San Francisco, California that offers a full-stack generative AI platform for enterprises. In September 2023, Writer raised $100m in a Series B led by ICONIQ Growth with participation from Insight Partners, WndrCo, Balderton Capital, and Aspect Ventures. The co-founders of Writer also worked together on Qordoba, a previous startup.

<span class="mw-page-title-main">GPT-2</span> 2019 text-generating language model

Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained on a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.

Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform.

Meta AI is an artificial intelligence laboratory owned by Meta Platforms Inc.. Meta AI develops various forms of artificial intelligence, including augmented and artificial reality technologies. Meta AI is also an academic research laboratory focused on generating knowledge for the AI community. This is in contrast to Facebook's Applied Machine Learning (AML) team, which focuses on practical applications of its products.

A foundation model is a machine learning or deep learning model that is trained on broad data such that it can be applied across a wide range of use cases. Foundation models have transformed artificial intelligence (AI), powering prominent generative AI applications like ChatGPT. The Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) created and popularized the term.

Hugging Face, Inc. is a French-American company based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent framework for generative artificial intelligence. They are artificial neural networks that are used in natural language processing tasks. GPTs are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.

<span class="mw-page-title-main">GPT-J</span> Open source artificial intelligence text generating language model developed by EleutherAI

GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.

<span class="mw-page-title-main">EleutherAI</span> Artificial intelligence research collective

EleutherAI is a grass-roots non-profit artificial intelligence (AI) research group. The group, considered an open-source version of OpenAI, was formed in a Discord server in July 2020 to organize a replication of GPT-3. In early 2023, it formally incorporated as the EleutherAI Foundation, a non-profit research institute.

A large language model (LLM) is a language model notable for its ability to achieve general-purpose language understanding and generation. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process. LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word.

Llama is a family of autoregressive large language models released by Meta AI starting in February 2023. The latest version is Llama 3 released in April 2024.

<span class="mw-page-title-main">PaLM</span> Large language model developed by Google

PaLM is a 540 billion parameter transformer-based large language model developed by Google AI. Researchers also trained smaller versions of PaLM, 8 and 62 billion parameter models, to test the effects of model scale.

Open-source artificial intelligence is the application of open-source practices to the development of artificial intelligence resources.

Watsonx is IBM's commercial generative AI and scientific data platform based on cloud. It offers a studio, data store, and governance toolkit. It supports multiple large language models (LLMs) along with IBM's own Granite.

Aleph Alpha is an independent German artificial intelligence (AI) startup company in Heidelberg, founded by professionals with experience as employees of Apple, SAP and Deloitte. Aleph Alpha attempts to provide a full-stack sovereign technology stack for generative AI, independent from US companies and comply with European data protection regulations and the AI Act. It built one of the most powerful AI clusters inside its own data centre and develops large language models (LLM), which try to provide transparency of its sources used for the results generated and are intended for enterprises and governmental agencies only. Training of its model has been done in five European languages.

Vicuna LLM is an omnibus Large Language Model used in AI research. Its methodology is to enable the public at large to contrast and compare the accuracy of LLMs "in the wild" and to vote on their output; a question-and-answer chat format is used. At the beginning of each round two LLM chatbots from a diverse pool of nine are presented randomly and anonymously, their identities only being revealed upon voting on their answers. The user has the option of either replaying ("regenerating") a round, or beginning an entirely fresh one with new LLMs. Based on Llama 2, it is an open source project, and it itself has become the subject of academic research in the burgeoning field. A non-commercial, public demo of the Vicuna-13b model is available to access using LMSYS.

Mistral AI is a French company selling artificial intelligence (AI) products. It was founded in April 2023 by previous employees of Meta Platforms and Google DeepMind. The company raised €385 million in October 2023, and in December 2023, it was valued at more than $2 billion.

IBM Think is an annual business conference organized by IBM. Before 2018, IBM held similar business conferences under different names. Think is seen as a successor to World of Watson that was held in 2017. The conference name is a reference the 'Think' slogan used by IBM.

DBRX is an open-sourced large language model (LLM) developed by Mosaic ML team at Databricks, released on March 27, 2024. It is a mixture-of-experts Transformer model, with 132 billion parameters in total. 36 billion parameters are active for each token. The released model comes in either a base foundation model version or an instruct-tuned variant.

References

  1. 1 2 McDowell, Steve. "IBM's New Granite Foundation Models Enable Safe Enterprise AI". Forbes.
  2. ibm-granite/granite-code-models, IBM Granite, 2024-05-08, retrieved 2024-05-08
  3. 1 2 Nirmal, Dinesh (September 7, 2023). "Building AI for business: IBM's Granite foundation models". IBM .
  4. "IBM debuts Granite series of hardware-efficient language models". September 7, 2023.
  5. "Granite Foundation Models" (PDF). IBM. 2023-11-30.
  6. Fritts, Harold (2024-04-22). "IBM Adds Meta Llama 3 To watsonx, Expands AI Offerings". StorageReview.com. Retrieved 2024-05-08.
  7. Jindal, Siddharth (2024-05-07). "IBM Releases Open-Source Granite Code Models, Outperforms Llama 3". Analytics India Magazine. Retrieved 2024-05-08.
  8. Azhar, Ali (2024-04-08). "IBM Patents a Faster Method to Train LLMs for Enterprises". Datanami. Retrieved 2024-05-08.
  9. Wiggers, Kyle (2023-09-07). "IBM rolls out new generative AI features and models". TechCrunch. Retrieved 2024-05-08.
  10. "Introducing the Center for Research on Foundation Models (CRFM)". Stanford HAI. 18 August 2021.
  11. Pawar, Sahil (2023-09-11). "IBM Introduces Granite Series LLM Models for Watsonx Platform". Analytics Drift. Retrieved 2024-05-09.
  12. Nine, Adrianna (May 7, 2024). "IBM Makes Granite AI Models Open-Source Under New InstructLab Platform". ExtremeTech .
  13. "IBM open-sources its Granite AI models - and they mean business". ZDNET. Retrieved 2024-05-21.
  14. Jindal, Siddharth (2024-05-07). "IBM Releases Open-Source Granite Code Models, Outperforms Llama 3". Analytics India Magazine. Retrieved 2024-05-09.
  15. Synced (2024-05-13). "IBM's Granite Code: Powering Enterprise Software Development with AI Precision | Synced". syncedreview.com. Retrieved 2024-05-21.