Groq

Last updated
Groq, Inc.
Company type Private
Industry
Founded2016;8 years ago (2016)
Founders
  • Jonathan Ross
Headquarters,
US
Number of locations
Toronto, Canada
Key people
Jonathan Ross (CEO), Andrew S. Rappaport (Board Member), Chamath Palihapitiya (Investor)
ProductsLanguage Processing Unit (LPU)
Number of employees
250 (2023)
Website groq.com

Groq, Inc. is an American artificial intelligence (AI) company that builds an AI accelerator application-specific integrated circuit (ASIC) that they call the Language Processing Unit (LPU) and related hardware to accelerate the inference performance of AI workloads.

Contents

Examples of the types AI workloads that run on Groq's LPU are: large language models, [1] [2] image classification, [3] anomaly detection, [4] [5] and predictive analysis. [6] [7]

Groq is headquartered in Mountain View, CA, and has offices in San Jose, CA, Liberty Lake, WA, Toronto, Canada, London, U.K. and remote employees throughout North America and Europe.

History

Groq was founded in 2016 by a group of former Google engineers, led by Jonathan Ross, one of the designers of the Tensor Processing Unit (TPU), an AI accelerator ASIC, and Douglas Wightman, an entrepreneur and former engineer at Google X (known as X Development). [8]

Groq received seed funding from Social Capital’s Chamath Palihapitiya, with a $10 million investment in 2017 [9] and soon after secured additional funding.

In April of 2021, Groq raised $300 million in a series C round led by Tiger Global Management and D1 Capital Partners. [10] Current investors include: The Spruce House Partnership, Addition, GCM Grosvenor, Xⁿ, Firebolt Ventures, General Global Capital, and Tru Arrow Partners, as well as follow-on investments from TDK Ventures, XTX Ventures, Boardman Bay Capital Management, and Infinitum Partners. [11] [12] After Groq’s series C funding round, it was valued at over 1 billion dollars, making the startup a unicorn. [13]

On March 1st, 2022, Groq acquired Maxeler Technologies, a company known for its dataflow systems technologies. [14]

On August 16th, 2023, Groq selected Samsung Electronics foundry in Taylor, Texas to manufacture its next generation chips, on Samsung's 4-nanometer (nm) process node. This was the first order at this new Samsung chip factory. [15]

On February 19th, 2024, Groq soft launched a developer platform, GroqCloud, to attract developers into using the Groq API. [16] On March 1st, 2024 Groq acquired Definitive Intelligence, a startup known for offering a range of business-oriented AI solutions, to help with its cloud platform. [17]

Technology

A die photo of Groq's LPU V1 LPU-v1-die.jpg
A die photo of Groq’s LPU V1

Groq's initial name for their ASIC was the Tensor Streaming Processor (TSP), but later rebranded the TSP as the Language Processing Unit (LPU). [1] [18] [19]

The LPU features a functionally sliced microarchitecture, where memory units are interleaved with vector and matrix computation units. [20] [21] This design facilitates the exploitation of dataflow locality in AI compute graphs, improving execution performance and efficiency. The LPU was designed off of two key observations:

  1. AI workloads exhibit substantial data parallelism, which can be mapped onto purpose built hardware, leading to performance gains. [20] [21]
  2. A deterministic processor design, coupled with a producer-consumer programming model, allows for precise control and reasoning over hardware components, allowing for optimized performance and energy efficiency. [20] [21]

In addition to its functionally sliced microarchitecture, the LPU can also be characterized by its single core, deterministic architecture. [20] [22] The LPU is able to achieve deterministic execution by avoiding the use of traditional reactive hardware components (branch predictors, arbiters, reordering buffers, caches) [20] and by having all execution explicitly controlled by the compiler thereby guaranteeing determinism in execution of an LPU program. [21]

The first generation of the LPU (LPU v1) yields a computational density of more than 1TeraOp/s per square mm of silicon for its 25×29 mm 14nm chip operating at a nominal clock frequency of 900 MHz. [20] The second generation of the LPU (LPU v2) will be manufactured on Samsung's 4nm process node. [15]

Performance

Groq emerged as the first API provider to break the 100 tokens per second generation rate while running Meta’s Llama2-70B parameter model. [23]

Groq currently hosts a variety of open-source large language models running on its LPUs for public access. [24] Access to these demos are available through Groq's website. The LPU's performance while running these open source LLMs has been independently benchmarked by ArtificialAnalysis.ai, in comparison with other LLM providers. [25] The LPU's measured performance is shown in the table below:

Language Processing Unit LLM Performance
Model NameTokens/second (T/s)Latency (seconds)
Llama2-70B [26] [27] [28] 253 T/s 0.3s
Mixtral [29] 473 T/s 0.3s
Gemma [30] 826 T/s 0.3s

See also

Related Research Articles

<span class="mw-page-title-main">Arm Holdings</span> British multinational semiconductor and software design company

Arm Holdings plc is a British semiconductor and software design company based in Cambridge, England, whose primary business is the design of central processing unit (CPU) cores that implement the ARM architecture family of instruction sets. It also designs other chips, provides software development tools under the DS-5, RealView and Keil brands, and provides systems and platforms, system-on-a-chip (SoC) infrastructure and software. As a "holding" company, it also holds shares of other companies. Since 2016, it has been majority owned by Japanese conglomerate SoftBank Group.

Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores. Manycore processors are used extensively in embedded computers and high-performance computing.

<span class="mw-page-title-main">Nvidia Tesla</span> Nvidias line of general purpose GPUs

Nvidia Tesla is the former name for a line of products developed by Nvidia targeted at stream processing or general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began using GPUs from the G80 series, and have continued to accompany the release of new chips. They are programmable using the CUDA or OpenCL APIs.

This is a comparison of ARM instruction set architecture application processor cores designed by ARM Holdings and 3rd parties. It does not include ARM Cortex-R, ARM Cortex-M, or legacy ARM cores.

A vision processing unit (VPU) is an emerging class of microprocessor; it is a specific type of AI accelerator, designed to accelerate machine vision tasks.

<span class="mw-page-title-main">Tensor Processing Unit</span> AI accelerator ASIC by Google

Tensor Processing Unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google for neural network machine learning, using Google's own TensorFlow software. Google began using TPUs internally in 2015, and in 2018 made them available for third-party use, both as part of its cloud infrastructure and by offering a smaller version of the chip for sale.

An AI accelerator, deep learning processor, or neural processing unit (NPU) is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024, a typical AI integrated circuit chip contains tens of billions of MOSFETs.

<span class="mw-page-title-main">Nvidia DGX</span> Line of Nvidia produced servers and workstations

The Nvidia DGX represents a series of servers and workstations designed by Nvidia, primarily geared towards enhancing deep learning applications through the use of general-purpose computing on graphics processing units (GPGPU). These systems typically come in a rackmount format featuring high-performance x86 server CPUs on the motherboard.

<span class="mw-page-title-main">SiFive</span> Fabless semiconductor company providing RISC-V processors

SiFive, Inc. is an American fabless semiconductor company and provider of commercial RISC-V processor IP and silicon chips based on the RISC-V instruction set architecture (ISA). Its products include cores, SoCs, IPs, and development boards.

<span class="mw-page-title-main">Graphcore</span> British semiconductor company

Graphcore Limited is a British semiconductor company that develops accelerators for AI and machine learning. It has introduced a massively parallel Intelligence Processing Unit (IPU) that holds the complete machine learning model inside the processor.

<span class="mw-page-title-main">Power10</span> 2020 family of multi-core microprocessors by IBM

Power10 is a superscalar, multithreading, multi-core microprocessor family, based on the open source Power ISA, and announced in August 2020 at the Hot Chips conference; systems with Power10 CPUs. Generally available from September 2021 in the IBM Power10 Enterprise E1080 server.

Sapphire Rapids is a codename for Intel's server and workstation processors based on the Golden Cove microarchitecture and produced using Intel 7. It features up to 60 cores and an array of accelerators, and it is the first generation of Intel server and workstation processors to use a chiplet design.

The NVIDIA Deep Learning Accelerator (NVDLA) is an open-source hardware neural network AI accelerator created by Nvidia. The accelerator is written in Verilog and is configurable and scalable to meet many different architecture needs. NVDLA is merely an accelerator and any process must be scheduled and arbitered by an outside entity such as a CPU.

<span class="mw-page-title-main">Ampere (microarchitecture)</span> GPU microarchitecture by Nvidia

Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures. It was officially announced on May 14, 2020 and is named after French mathematician and physicist André-Marie Ampère.

<span class="mw-page-title-main">Ampere Computing</span> American fabless semiconductor company

Ampere Computing LLC is an American fabless semiconductor company based in Santa Clara, California that develops processors for servers operating in large scale environments. Ampere also has offices in: Portland, Oregon; Taipei, Taiwan; Raleigh, North Carolina; Bangalore, India; Warsaw, Poland; and Ho Chi Minh City, Vietnam.

<span class="mw-page-title-main">Cerebras</span> American semiconductor company

Cerebras Systems Inc. is an American artificial intelligence company with offices in Sunnyvale and San Diego, Toronto, Tokyo and Bangalore, India. Cerebras builds computer systems for complex artificial intelligence deep learning applications.

Specialized computer hardware is often used to execute artificial intelligence (AI) programs faster, and with less energy, such as Lisp machines, neuromorphic engineering, event cameras, and physical neural networks. As of 2023, the market for AI hardware is dominated by GPUs.

<span class="mw-page-title-main">Google Tensor</span> Series of system-on-chip processors

Google Tensor is a series of ARM64-based system-on-chip (SoC) processors designed by Google for its Pixel devices. It was originally conceptualized in 2016, following the introduction of the first Pixel smartphone, though actual developmental work did not enter full swing until 2020. The first-generation Tensor chip debuted on the Pixel 6 smartphone series in 2021, and were succeeded by the Tensor G2 chip in 2022 and G3 in 2023. Tensor has been generally well received by critics.

Meta AI is an artificial intelligence laboratory owned by Meta Platforms Inc., Meta AI develops various forms of artificial intelligence, including augmented and artificial reality technologies. It is also an academic research laboratory focused on generating knowledge for the AI community. This is in contrast to Facebook's Applied Machine Learning (AML) team, which focuses on practical applications of its products.

<span class="mw-page-title-main">Multiverse Computing</span> Quantum computing company

Multiverse Computing is a Spanish quantum computing software company headquartered in San Sebastián, Spain, with offices in Paris, Munich, London, Toronto and Sherbrooke, Canada. The Spanish startup applies quantum and quantum-inspired algorithms to problems in energy, logistics, manufacturing, mobility, life sciences, finance, cybersecurity, chemistry, materials science and aerospace.

References

  1. 1 2 Williams, Wayne (27 February 2024). "'Feels like magic!': Groq's ultrafast LPU could well be the first LLM-native processor — and its latest demo may well convince Nvidia and AMD to get out their checkbooks". TechRadar Pro. TechRadar. Retrieved 19 April 2024.
  2. Ward-Foxton, Sally. "Groq Demonstrates Fast LLMs on 4-Year-Old Silicon". EETimes. Retrieved 19 April 2024.
  3. Ward-Foxton, Sally. "Groq's AI Chip Debuts in the Cloud". EETimes. Retrieved 19 April 2024.
  4. Moorhead, Patrick. "US Army Analytics Group – Cybersecurity Anomaly Detection 1000X Faster With Less False Positives". Forbes. Retrieved 19 April 2024.
  5. Herman, Arthur. "Cybersecurity Is Entering The High-Tech Era". Forbes. Retrieved 19 April 2024.
  6. Heinonen, Nils. "Researchers accelerate fusion research with Argonne's Groq AI platform". Argonne Leadership Computing Facility. Retrieved 19 April 2024.
  7. Larwood, Mariah; Cerny, Beth. "Argonne deploys new Groq system to ALCF AI Testbed, providing AI accelerator access to researchers globally". Argonne Leadership Computing Facility. Retrieved 19 April 2024.
  8. Levy, Ari (21 April 2017). "Several Google engineers have left one of its most secretive AI projects to form a stealth start-up". CNBC. Retrieved 19 April 2024.
  9. Clark, Kate (6 September 2018). "Secretive semiconductor startup Groq raises $52M from Social Capital". TechCrunch. Retrieved 19 April 2024.
  10. King, Ian. "Tiger Global, D1 Lead $300 Million Round in AI Chip Startup Groq". Bloomberg. Retrieved 19 April 2024.
  11. Wheatly, Mike (14 April 2021). "AI chipmaker Groq raises $300M in Series C round". Silicon Angle. Retrieved 19 April 2024.
  12. McFarland, Alex. "AI Chip Startup Groq Closes $300 Million in Series C Fundraising". Unite.AI. Retrieved 19 April 2024.
  13. Andonov, Kaloyan; Lavine, Rob (19 April 2021). "Analysis: Groq computes a $300m series C". Global Venturing. Retrieved 19 April 2024.
  14. Prickett Morgan, Timothy (2 March 2022). "GROQ BUYS MAXELER FOR ITS HPC AND AI DATAFLOW EXPERTISE". The Next Platform. Retrieved 19 April 2024.
  15. 1 2 Hwang, Jeong-Soo. "Samsung's new US chip fab wins first foundry order from Groq". The Korea Economic Daily. Retrieved 19 April 2024.
  16. Franzen, Carl (March 2024). "Groq launches developer playground GroqCloud with newly acquired Definitive Intelligence". Venture Beat. Retrieved 19 April 2024.
  17. Wiggers, Kyle (March 2024). "AI chip startup Groq forms new business unit, acquires Definitive Intelligence". TechCrunch. Retrieved 19 April 2024.
  18. Mellor, Chris (23 January 2024). "Grokking Groq's Groqness". Blocks & Files. Retrieved 19 April 2024.
  19. Abts, Dennis; Ross, Jonathan; Sparling, Jonathan; Wong-VanHaren, Mark; Baker, Max; Hawkins, Tom; Bell, Andrew; Thompson, John; Kahsai, Temesghen; Kimmell, Garrin; Hwang, Jennifer; Leslie-Hurd, Rebekah; Bye, Michael; Creswick, E.R.; Boyd, Matthew; Venigalla, Mahitha; Laforge, Evan; Purdy, Jon; Kamath, Purushotham; Maheshwari, Dinesh; Beidler, Michael; Rosseel, Geert; Ahmad, Omar; Gagarin, Gleb; Czekalski, Richard; Rane, Ashay; Parmar, Sahil; Werner, Jeff; Sproch, Jim; Macias, Adrian; Kurtz, Brian (May 2020). "Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads" (PDF). 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). pp. 145–158. doi:10.1109/ISCA45697.2020.00023. ISBN   978-1-7281-4661-4.
  20. 1 2 3 4 5 6 Abts, Dennis; Kimmell, Garrin; Ling, Andrew; Kim, John; Boyd, Matt; Bitar, Andrew; Parmar, Sahil; Ahmed, Ibrahim; Dicecco, Roberto; Han, David; Thompson, John; Bye, Michael; Hwang, Jennifer; Fowers, Jeremy; Lillian, Peter; Murthy, Ashwin; Mehtabuddin, Elyas; Tekur, Chetan; Sohmers, Thomas; Kang, Kris; Maresh, Stephen; Ross, Jonathan (2022-06-11). "A software-defined tensor streaming multiprocessor for large-scale machine learning". Proceedings of the 49th Annual International Symposium on Computer Architecture. pp. 567–580. doi:10.1145/3470496.3527405. ISBN   978-1-4503-8610-4.
  21. 1 2 3 4 Abts, Dennis; Kimmell, Garrin; Ling, Andrew; Kim, John; Boyd, Matt; Bitar, Andrew; Parmar, Sahil; Ahmed, Ibrahim; Dicecco, Roberto; Han, David; Thompson, John; Bye, Michael; Hwang, Jennifer; Fowers, Jeremy; Lillian, Peter; Murthy, Ashwin; Mehtabuddin, Elyas; Tekur, Chetan; Sohmers, Thomas; Kang, Kris; Maresh, Stephen; Ross, Jonathan (June 11, 2022). "A software-defined tensor streaming multiprocessor for large-scale machine learning". Proceedings of the 49th Annual International Symposium on Computer Architecture. pp. 567–580. doi:10.1145/3470496.3527405. ISBN   978-1-4503-8610-4 . Retrieved 2024-03-18.{{cite book}}: |journal= ignored (help)
  22. Singh, Satnam (February 11, 2022). "The Virtuous Cycles of Determinism: Programming Groq's Tensor Streaming Processor". Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. p. 153. doi:10.1145/3490422.3510453. ISBN   978-1-4503-9149-8 . Retrieved 2024-03-18.{{cite book}}: |journal= ignored (help)
  23. Smith-Goodson, Paul. "Groq's Record-Breaking Language Processor Hits 100 Tokens Per Second On A Massive AI Model". Forbes. Retrieved 19 April 2024.
  24. Morrison, Ryan (27 February 2024). "Meet Groq — the chip designed to run AI models really, really fast". Tom’s Guide. Retrieved 19 April 2024.
  25. "Groq Shows Promising Results in New LLM Benchmark, Surpassing Industry Averages". HPCwire. 2024-02-13. Retrieved 2024-03-18.
  26. "Llama-2 Chat 70B Providers". artificialanalysis.ai. Retrieved 2024-03-18.
  27. "Groq Shows Promising Results in New LLM Benchmark, Surpassing Industry Averages". Datanami. 2024-02-13. Retrieved 2024-03-18.
  28. "Groq Demos Fast LLMs on 4-Year-Old Silicon". EE Times. 2023-09-12. Retrieved 2024-03-18.
  29. "Mixtral 8x7B Instruct Providers". artificialanalysis.ai. Retrieved 2024-03-18.
  30. "Gemma-7B Models Providers". artificialanalysis.ai. Retrieved 2024-03-18.