Company type | Private |
---|---|
Industry | |
Founded | 2016 |
Founders |
|
Headquarters | , US |
Key people | Jonathan Ross (CEO), Andrew S. Rappaport (Board Member), Chamath Palihapitiya (Investor) |
Products | Language Processing Unit (LPU) |
Revenue | US$3.2 million (2023) [1] |
US$−88 million (2023) [1] | |
Number of employees | 250 (2023) |
Website | groq |
Groq, Inc. is an American artificial intelligence (AI) company that builds an AI accelerator application-specific integrated circuit (ASIC) that they call the Language Processing Unit (LPU) and related hardware to accelerate the inference performance of AI workloads.
Examples of the types AI workloads that run on Groq's LPU are: large language models (LLMs), [2] [3] image classification, [4] anomaly detection, [5] [6] and predictive analysis. [7] [8]
Groq is headquartered in Mountain View, CA, and has offices in San Jose, CA, Liberty Lake, WA, Toronto, Canada, London, U.K. and remote employees throughout North America and Europe.
Groq was founded in 2016 by a group of former Google engineers, led by Jonathan Ross, one of the designers of the Tensor Processing Unit (TPU), an AI accelerator ASIC, and Douglas Wightman, an entrepreneur and former engineer at Google X (known as X Development), who served as the company’s first CEO. [9] [1]
Groq received seed funding from Social Capital's Chamath Palihapitiya, with a $10 million investment in 2017 [10] and soon after secured additional funding.
In April 2021, Groq raised $300 million in a series C round led by Tiger Global Management and D1 Capital Partners. [11] Current investors include: The Spruce House Partnership, Addition, GCM Grosvenor, Xⁿ, Firebolt Ventures, General Global Capital, and Tru Arrow Partners, as well as follow-on investments from TDK Ventures, XTX Ventures, Boardman Bay Capital Management, and Infinitum Partners. [12] [13] After Groq’s series C funding round, it was valued at over $1 billion, making the startup a unicorn. [14]
On March 1, 2022, Groq acquired Maxeler Technologies, a company known for its dataflow systems technologies. [15]
On August 16, 2023, Groq selected Samsung Electronics foundry in Taylor, Texas to manufacture its next generation chips, on Samsung's 4-nanometer (nm) process node. This was the first order at this new Samsung chip factory. [16]
On February 19, 2024, Groq soft launched a developer platform, GroqCloud, to attract developers into using the Groq API and rent access to their chips. [17] [1] On March 1, 2024 Groq acquired Definitive Intelligence, a startup known for offering a range of business-oriented AI solutions, to help with its cloud platform. [18]
Groq raised $640 million in a series D round led by BlackRock Private Equity Partners in August 2024, valuing the company at $2.8 billion. [1] [19]
Groq's initial name for their ASIC was the Tensor Streaming Processor (TSP), but later rebranded the TSP as the Language Processing Unit (LPU). [2] [20] [21]
The LPU features a functionally sliced microarchitecture, where memory units are interleaved with vector and matrix computation units. [22] [23] This design facilitates the exploitation of dataflow locality in AI compute graphs, improving execution performance and efficiency. The LPU was designed off of two key observations:
In addition to its functionally sliced microarchitecture, the LPU can also be characterized by its single core, deterministic architecture. [22] [24] The LPU is able to achieve deterministic execution by avoiding the use of traditional reactive hardware components (branch predictors, arbiters, reordering buffers, caches) [22] and by having all execution explicitly controlled by the compiler thereby guaranteeing determinism in execution of an LPU program. [23]
The first generation of the LPU (LPU v1) yields a computational density of more than 1TeraOp/s per square mm of silicon for its 25×29 mm 14nm chip operating at a nominal clock frequency of 900 MHz. [22] The second generation of the LPU (LPU v2) will be manufactured on Samsung's 4nm process node. [16]
Groq emerged as the first API provider to break the 100 tokens per second generation rate while running Meta’s Llama2-70B parameter model. [25]
Groq currently hosts a variety of open-source large language models running on its LPUs for public access. [26] Access to these demos are available through Groq's website. The LPU's performance while running these open source LLMs has been independently benchmarked by ArtificialAnalysis.ai, in comparison with other LLM providers. [27] The LPU's measured performance is shown in the table below:
Model Name | Tokens/second (T/s) | Latency (seconds) |
---|---|---|
Llama2-70B [28] [29] [30] | 253 T/s | 0.3s |
Mixtral [31] | 473 T/s | 0.3s |
Gemma [32] | 826 T/s | 0.3s |
The transistor count is the number of transistors in an electronic device. It is the most common measure of integrated circuit complexity. The rate at which MOS transistor counts have increased generally follows Moore's law, which observes that transistor count doubles approximately every two years. However, being directly proportional to the area of a die, transistor count does not represent how advanced the corresponding manufacturing technology is. A better indication of this is transistor density which is the ratio of a semiconductor's transistor count to its die area.
eSilicon is a company engaged in semiconductor design and manufacturing services, that delivers custom ICs and IPs to OEMs.
The Samsung Exynos, formerly Hummingbird (Korean: 엑시노스), is a series of ARM-based system-on-chips developed by Samsung Electronics' System LSI division and manufactured by Samsung Foundry. It is a continuation of Samsung's earlier S3C, S5L and S5P line of SoCs.
Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma et al. Registration requires a credit card or bank account details.
This is a comparison of ARM instruction set architecture application processor cores designed by ARM Holdings and 3rd parties. It does not include ARM Cortex-R, ARM Cortex-M, or legacy ARM cores.
The Samsung Galaxy Alpha (SM-G850x) is an Android smartphone produced by Samsung Electronics. Unveiled on 13 August 2014, the device was released in September 2014. A high-end device, the Galaxy Alpha is Samsung's first Android-powered smartphone to incorporate a metallic frame, although the remainder of its physical appearance still resembles previous models such as the Galaxy S5. It also incorporates Samsung's new Exynos 5430 system-on-chip, which is the first mobile system-on-chip to use a 20 nanometer manufacturing process.
Databricks, Inc. is a global data, analytics, and artificial intelligence (AI) company founded by the original creators of Apache Spark.
A vision processing unit (VPU) is an emerging class of microprocessor; it is a specific type of AI accelerator, designed to accelerate machine vision tasks.
Tensor Processing Unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google for neural network machine learning, using Google's own TensorFlow software. Google began using TPUs internally in 2015, and in 2018 made them available for third-party use, both as part of its cloud infrastructure and by offering a smaller version of the chip for sale.
An AI accelerator, deep learning processor or neural processing unit (NPU) is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and computer vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024, a typical AI integrated circuit chip contains tens of billions of MOSFETs.
SiFive, Inc. is an American fabless semiconductor company and provider of commercial RISC-V processors and silicon chips based on the RISC-V instruction set architecture (ISA). Its products include cores, SoCs, IPs, and development boards.
Graphcore Limited is a British semiconductor company that develops accelerators for AI and machine learning. It has introduced a massively parallel Intelligence Processing Unit (IPU) that holds the complete machine learning model inside the processor.
Turing is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is named after the prominent mathematician and computer scientist Alan Turing. The architecture was first introduced in August 2018 at SIGGRAPH 2018 in the workstation-oriented Quadro RTX cards, and one week later at Gamescom in consumer GeForce 20 series graphics cards. Building on the preliminary work of Volta, its HPC-exclusive predecessor, the Turing architecture introduces the first consumer products capable of real-time ray tracing, a longstanding goal of the computer graphics industry. Key elements include dedicated artificial intelligence processors and dedicated ray tracing processors. Turing leverages DXR, OptiX, and Vulkan for access to ray tracing. In February 2019, Nvidia released the GeForce 16 series GPUs, which utilizes the new Turing design but lacks the RT and Tensor cores.
In semiconductor manufacturing, the 3nm process is the next die shrink after the 5 nm MOSFET technology node. South Korean chipmaker Samsung started shipping its 3 nm gate all around (GAA) process, named 3GAA, in mid-2022. On 29 December 2022, Taiwanese chip manufacturer TSMC announced that volume production using its 3 nm semiconductor node (N3) was underway with good yields. An enhanced 3 nm chip process called "N3E" may have started production in 2023. American manufacturer Intel planned to start 3 nm production in 2023.
Ampere Computing LLC is an American fabless semiconductor company based in Santa Clara, California that develops processors for servers operating in large scale environments. It was founded in 2017 by Renée James.
Cerebras Systems Inc. is an American artificial intelligence (AI) company with offices in Sunnyvale, San Diego, Toronto, and Bangalore, India. Cerebras builds computer systems for complex AI deep learning applications.
Google Tensor is a series of ARM64-based system-on-chip (SoC) processors designed by Google for its Pixel devices. It was originally conceptualized in 2016, following the introduction of the first Pixel smartphone, though actual developmental work did not enter full swing until 2020. The first-generation Tensor chip debuted on the Pixel 6 smartphone series in 2021, and was succeeded by the Tensor G2 chip in 2022, G3 in 2023 and G4 in 2024. Tensor has been generally well received by critics.
Hugging Face, Inc. is an American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.
Kinara is an American semiconductor company that develops AI processors for machine learning applications.
Multiverse Computing is a Spanish quantum computing software company headquartered in San Sebastián, Spain, with offices in Paris, Munich, London, Toronto and Sherbrooke, Canada. The Spanish startup applies quantum and quantum-inspired algorithms to problems in energy, logistics, manufacturing, mobility, life sciences, finance, cybersecurity, chemistry, materials science and aerospace.