Developer(s) | IBM Research [1] |
---|---|
Initial release | November 7, 2023 |
Platform | IBM Watsonx (initially) GitHub Hugging Face RHEL AI |
Type | |
License | Proprietary Code models: Open Source (Apache 2.0) [2] |
Part of a series on |
Machine learning and data mining |
---|
IBM Granite is a series of decoder-only foundation models created by IBM. It was announced on September 7, 2023, [3] [4] and an initial paper was published 4 days later. [5] Initially intended for use in the IBM's cloud-based data and generative AI platform Watsonx along with other models, [6] IBM opened the source code of some code models. [7] Granite models are trained on datasets curated from Internet, academic publishings, code datasets, legal and finance documents. [8] [9] [1]
A foundation model is an AI model trained on broad data at scale such that it can be adapted to a wide range of downstream tasks. [10]
Granite's first foundation models were Granite.13b.instruct and Granite.13b.chat. The "13b" in their name comes from 13 billion, the amount of parameters they have as models, lesser than most of the larger models of the time. Later models vary from 3 to 34 billion parameters. [3] [11]
On May 6, 2024, IBM released the source code of four variations of Granite Code Models under Apache 2, an open source permissive license that allows completely free use, modification and sharing of the software, and put them on Hugging Face for public use. [12] [13] According to IBM's own report, Granite 8b outperforms Llama 3 on several coding related tasks within similar range of parameters. [14] [15]
Databricks, Inc. is a global data, analytics and artificial intelligence company founded by the original creators of Apache Spark.
Writer is a generative artificial intelligence company based in San Francisco, California that offers a full-stack generative AI platform for enterprises. In September 2023, Writer raised $100m in a Series B led by ICONIQ Growth with participation from Insight Partners, WndrCo, Balderton Capital, and Aspect Ventures. The co-founders of Writer also worked together on Qordoba, a previous startup.
Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained on a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.
Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform.
Meta AI is an artificial intelligence laboratory owned by Meta Platforms Inc.. Meta AI develops various forms of artificial intelligence, including augmented and artificial reality technologies. Meta AI is also an academic research laboratory focused on generating knowledge for the AI community. This is in contrast to Facebook's Applied Machine Learning (AML) team, which focuses on practical applications of its products.
A foundation model is a machine learning or deep learning model that is trained on broad data such that it can be applied across a wide range of use cases. Foundation models have transformed artificial intelligence (AI), powering prominent generative AI applications like ChatGPT. The Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) created and popularized the term.
Hugging Face, Inc. is a French-American company based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.
Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent framework for generative artificial intelligence. They are artificial neural networks that are used in natural language processing tasks. GPTs are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.
GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.
EleutherAI is a grass-roots non-profit artificial intelligence (AI) research group. The group, considered an open-source version of OpenAI, was formed in a Discord server in July 2020 to organize a replication of GPT-3. In early 2023, it formally incorporated as the EleutherAI Foundation, a non-profit research institute.
A large language model (LLM) is a language model notable for its ability to achieve general-purpose language understanding and generation. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process. LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word.
Llama is a family of autoregressive large language models released by Meta AI starting in February 2023. The latest version is Llama 3 released in April 2024.
PaLM is a 540 billion parameter transformer-based large language model developed by Google AI. Researchers also trained smaller versions of PaLM, 8 and 62 billion parameter models, to test the effects of model scale.
Open-source artificial intelligence is the application of open-source practices to the development of artificial intelligence resources.
Watsonx is IBM's commercial generative AI and scientific data platform based on cloud. It offers a studio, data store, and governance toolkit. It supports multiple large language models (LLMs) along with IBM's own Granite.
Aleph Alpha is an independent German artificial intelligence (AI) startup company in Heidelberg, founded by professionals with experience as employees of Apple, SAP and Deloitte. Aleph Alpha attempts to provide a full-stack sovereign technology stack for generative AI, independent from US companies and comply with European data protection regulations and the AI Act. It built one of the most powerful AI clusters inside its own data centre and develops large language models (LLM), which try to provide transparency of its sources used for the results generated and are intended for enterprises and governmental agencies only. Training of its model has been done in five European languages.
Vicuna LLM is an omnibus Large Language Model used in AI research. Its methodology is to enable the public at large to contrast and compare the accuracy of LLMs "in the wild" and to vote on their output; a question-and-answer chat format is used. At the beginning of each round two LLM chatbots from a diverse pool of nine are presented randomly and anonymously, their identities only being revealed upon voting on their answers. The user has the option of either replaying ("regenerating") a round, or beginning an entirely fresh one with new LLMs. Based on Llama 2, it is an open source project, and it itself has become the subject of academic research in the burgeoning field. A non-commercial, public demo of the Vicuna-13b model is available to access using LMSYS.
Mistral AI is a French company selling artificial intelligence (AI) products. It was founded in April 2023 by previous employees of Meta Platforms and Google DeepMind. The company raised €385 million in October 2023, and in December 2023, it was valued at more than $2 billion.
IBM Think is an annual business conference organized by IBM. Before 2018, IBM held similar business conferences under different names. Think is seen as a successor to World of Watson that was held in 2017. The conference name is a reference the 'Think' slogan used by IBM.
DBRX is an open-sourced large language model (LLM) developed by Mosaic ML team at Databricks, released on March 27, 2024. It is a mixture-of-experts Transformer model, with 132 billion parameters in total. 36 billion parameters are active for each token. The released model comes in either a base foundation model version or an instruct-tuned variant.