IBM Granite

Last updated
Granite
Developer(s) IBM Research [1]
Initial releaseNovember 7, 2023;13 months ago (2023-11-07)
Platform IBM Watsonx (initially)
GitHub
Hugging Face
RHEL AI
Type
License Proprietary
Code models: Open Source (Apache 2.0) [2]
Website www.ibm.com/granite   OOjs UI icon edit-ltr-progressive.svg

IBM Granite is a series of decoder-only AI foundation models created by IBM [3] . It was announced on September 7, 2023, [4] [5] and an initial paper was published 4 days later. [6] Initially intended for use in the IBM's cloud-based data and generative AI platform Watsonx along with other models, [7] IBM opened the source code of some code models. [8] [9] Granite models are trained on datasets curated from Internet, academic publishings, code datasets, legal and finance documents. [10] [11] [1]

Contents

Foundation models

A foundation model is an AI model trained on broad data at scale such that it can be adapted to a wide range of downstream tasks. [12]

Granite's first foundation models were Granite.13b.instruct and Granite.13b.chat. The "13b" in their name comes from 13 billion, the amount of parameters they have as models, lesser than most of the larger models of the time. Later models vary from 3 to 34 billion parameters. [4] [13]

On May 6, 2024, IBM released the source code of four variations of Granite Code Models under Apache 2, an open source permissive license that allows completely free use, modification and sharing of the software, and put them on Hugging Face for public use. [14] [15] According to IBM's own report, Granite 8b outperforms Llama 3 on several coding related tasks within similar range of parameters. [16] [17]

See also

Related Research Articles

SingleStore is a distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing.

<span class="mw-page-title-main">Anaconda (Python distribution)</span> Python and R distribution

Anaconda is an open source data science and artificial intelligence distribution platform for Python and R programming languages. Developed by Anaconda, Inc., an American company founded in 2012, the platform is used to develop and manage data science and AI projects. In 2024, Anaconda Inc. has about 300 employees and 45 million users.

<span class="mw-page-title-main">Databricks</span> American software company

Databricks, Inc. is a global data, analytics, and artificial intelligence (AI) company, founded in 2013 by the original creators of Apache Spark. The company provides a cloud-based platform to help enterprises build, scale, and govern data and AI, including generative AI and other machine learning models.

<span class="mw-page-title-main">Dynatrace</span> American technology company

Dynatrace, Inc. is a global technology company that provides a software observability platform based on artificial intelligence (AI) and automation. Dynatrace technologies are used to monitor, analyze, and optimize application performance, software development and security practices, IT infrastructure, and user experience for businesses and government agencies throughout the world.

Multimodal learning is a type of deep learning that integrates and processes multiple types of data, referred to as modalities, such as text, audio, images, or video. This integration allows for a more holistic understanding of complex data, improving model performance in tasks like visual question answering, cross-modal retrieval, text-to-image generation, aesthetic ranking, and image captioning.

<span class="mw-page-title-main">Writer Inc.</span> Company

Writer is a generative artificial intelligence company based in San Francisco, California that offers a full-stack generative AI platform for enterprises. In September 2023, Writer raised $100m in a Series B led by ICONIQ Growth with participation from Insight Partners, WndrCo, Balderton Capital, and Aspect Ventures. The co-founders also worked together on Qordoba, a previous startup.

<span class="mw-page-title-main">GPT-2</span> 2019 text-generating language model

Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained on a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.

Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative artificial intelligence (AI) model. A prompt is natural language text describing the task that an AI should perform. A prompt for a text-to-text language model can be a query such as "what is Fermat's little theorem?", a command such as "write a poem in the style of Edgar Allan Poe about leaves falling", or a longer statement including context, instructions, and conversation history.

Hugging Face, Inc. is an American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an artificial neural network that is used in natural language processing by machines. It is based on the transformer deep learning architecture, pre-trained on large data sets of unlabeled text, and able to generate novel human-like content. As of 2023, most LLMs had these characteristics and are sometimes referred to broadly as GPTs.

A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.

<span class="mw-page-title-main">Llama (language model)</span> Large language model by Meta AI

Llama is a family of autoregressive large language models (LLMs) released by Meta AI starting in February 2023. The latest version is Llama 3.3, released in December 2024.

<span class="mw-page-title-main">PaLM</span> Large language model developed by Google

PaLM is a 540 billion-parameter transformer-based large language model (LLM) developed by Google AI. Researchers also trained smaller versions of PaLM to test the effects of model scale.

Open-source artificial intelligence is an AI system that is freely available to use, study, modify, and share. These attributes extend to each of the system's components, including datasets, code, and model parameters, promoting a collaborative and transparent approach to AI development. Free and open-source software (FOSS) licenses, such as the Apache License, MIT License, and GNU General Public License, outline the terms under which open-source artificial intelligence can be accessed, modified, and redistributed.

Watsonx is IBM's commercial generative AI and scientific data platform based on cloud. It offers a studio, data store, and governance toolkit. It supports multiple large language models (LLMs) along with IBM's own Granite.

<span class="mw-page-title-main">Aleph Alpha</span> Artificial intelligence company

Aleph Alpha GmbH is a German artificial intelligence (AI) startup company founded by Jonas Andrulis and Samuel Weinbach, both of whom have professional experience at companies such as Apple and Deloitte. Based in Heidelberg, the company aims to develop a sovereign technology stack for generative AI that operates independently of U.S. companies and complies with European data protection regulations, including the Artificial Intelligence Act. Aleph Alpha has established reportedly one of the most powerful AI clusters within its own data center, and specializes in developing large language models (LLM). These models are designed to provide transparency regarding the sources used for generating results and are intended for use by enterprises and governmental agencies. The training of these models has been conducted in five European languages.

<span class="mw-page-title-main">Mistral AI</span> French artificial intelligence company

Mistral AI, headquartered in Paris, France specializes in artificial intelligence (AI) products and focuses on open-weight large language models, (LLMs). Founded in April 2023 by former engineers from Google DeepMind and Meta Platforms, the company has gained prominence as an alternative to proprietary AI systems. Named after the mistral – a powerful, cold wind in southern France – the company emphasized openness and innovation in the AI field. Mistral AI positions itself as an alternative to proprietary models.

Huawei PanGu, PanGu, PanGu-Σ or PanGu-π is a multimodal large language model developed by Huawei. It was announced on July 7, 2023, positioned as a contender to other multimodal large language models.

IBM Think is an annual business conference organized by IBM. Before 2018, IBM held similar business conferences under different names. Think is seen as a successor to World of Watson that was held in 2017. The conference name is a reference the 'Think' slogan used by IBM.

<span class="mw-page-title-main">DBRX</span> Open-sourced large language model

DBRX is an open-sourced large language model (LLM) developed by Mosaic ML team at Databricks, released on March 27, 2024. It is a mixture-of-experts transformer model, with 132 billion parameters in total. 36 billion parameters are active for each token. The released model comes in either a base foundation model version or an instruction-tuned variant.

References

  1. 1 2 McDowell, Steve. "IBM's New Granite Foundation Models Enable Safe Enterprise AI". Forbes.
  2. ibm-granite/granite-code-models, IBM Granite, 2024-05-08, retrieved 2024-05-08
  3. "IBM Granite". IBM . 24 June 2024.
  4. 1 2 Nirmal, Dinesh (September 7, 2023). "Building AI for business: IBM's Granite foundation models". IBM .
  5. "IBM debuts Granite series of hardware-efficient language models". September 7, 2023.
  6. "Granite Foundation Models" (PDF). IBM. 2023-11-30.
  7. Fritts, Harold (2024-04-22). "IBM Adds Meta Llama 3 To watsonx, Expands AI Offerings". StorageReview.com. Retrieved 2024-05-08.
  8. Jindal, Siddharth (2024-05-07). "IBM Releases Open-Source Granite Code Models, Outperforms Llama 3". Analytics India Magazine. Retrieved 2024-05-08.
  9. "Open sourcing IBM's Granite code models". 9 February 2021.
  10. Azhar, Ali (2024-04-08). "IBM Patents a Faster Method to Train LLMs for Enterprises". Datanami. Retrieved 2024-05-08.
  11. Wiggers, Kyle (2023-09-07). "IBM rolls out new generative AI features and models". TechCrunch. Retrieved 2024-05-08.
  12. "Introducing the Center for Research on Foundation Models (CRFM)". Stanford HAI. 18 August 2021.
  13. Pawar, Sahil (2023-09-11). "IBM Introduces Granite Series LLM Models for Watsonx Platform". Analytics Drift. Retrieved 2024-05-09.
  14. Nine, Adrianna (May 7, 2024). "IBM Makes Granite AI Models Open-Source Under New InstructLab Platform". ExtremeTech .
  15. "IBM open-sources its Granite AI models - and they mean business". ZDNET. Retrieved 2024-05-21.
  16. Jindal, Siddharth (2024-05-07). "IBM Releases Open-Source Granite Code Models, Outperforms Llama 3". Analytics India Magazine. Retrieved 2024-05-09.
  17. Synced (2024-05-13). "IBM's Granite Code: Powering Enterprise Software Development with AI Precision | Synced". syncedreview.com. Retrieved 2024-05-21.