Company type | Private |
---|---|
Industry | |
Founded | 2015 |
Founders |
|
Headquarters | , US |
Key people | Andrew Feldman (CEO) |
Products | Wafer Scale Engine |
Revenue | $78.7 million (2023) [1] |
$−127 million (2023) [1] | |
Number of employees | 401 (2024) [2] |
Website | cerebras |
Cerebras Systems Inc. is an American artificial intelligence (AI) company with offices in Sunnyvale, San Diego, Toronto, and Bangalore, India. [3] [4] Cerebras builds computer systems for complex AI deep learning applications. [5]
Cerebras was founded in 2015 by Andrew Feldman, Gary Lauterbach, Michael James, Sean Lie and Jean-Philippe Fricker. [6] These five founders worked together at SeaMicro, which was started in 2007 by Feldman and Lauterbach and was later sold to AMD in 2012 for $334 million. [7] [8]
In May 2016, Cerebras secured $27 million in series A funding led by Benchmark, Foundation Capital and Eclipse Ventures. [9] [6]
In December 2016, series B funding was led by Coatue Management, followed in January 2017 with series C funding led by VY Capital. [6]
In November 2018, Cerebras closed its series D round with $88 million, making the company a unicorn. Investors in this round included Altimeter, VY Capital, Coatue, Foundation Capital, Benchmark, and Eclipse. [10] [11]
On August 19, 2019, Cerebras announced its first-generation Wafer-Scale Engine (WSE). [12] [13] [14] ’
In November 2019, Cerebras closed its series E round with over $270 million for a valuation of $2.4 billion. [15]
In 2020, the company announced an office in Japan and partnership with Tokyo Electron Devices. [16]
In April 2021, Cerebras announced the CS-2 based on the company's Wafer Scale Engine Two (WSE-2), which has 850,000 cores. [17] In August 2021, the company announced its brain-scale technology that can run a neural network with over 120 trillion connections. [18]
In November 2021, Cerebras announced that it had raised an additional $250 million in Series F funding, valuing the company at over $4 billion. The Series F financing round was led by Alpha Wave Ventures and Abu Dhabi Growth Fund (ADG). [19] To date, the company has raised $720 million in financing. [19] [20]
In August 2022, Cerebras was honored by the Computer History Museum in Mountain View, California. The museum added to its permanent collection and unveiled a new display featuring the WSE-2—the biggest computer chip made so far—marking an "epochal" achievement in the history of fabricating transistors as an integrated part. [21] [22]
Cerebras filed its prospectus for initial public offering (IPO) in September 2024, with the intention of listing on the Nasdaq exchange under the ticker 'CBRS'. The prospectus indicated that most of its revenue at the time came from Emirati AI holding company G42. [23] A week after the filing, it was reported that the Committee on Foreign Investment in the United States was reviewing G42's investment into the company, leading to a potential delay in its IPO. [24]
Cerebras was named to the Forbes AI 50 in April 2024 [25] and the TIME 100 Most Influential Companies list in May 2024. [26]
The Cerebras Wafer Scale Engine (WSE) is a single, wafer-scale integrated processor that includes compute, memory and interconnect fabric. The WSE-1 powers the Cerebras CS-1, Cerebras’ first-generation AI computer. [27] It is a 19-inch rack-mounted appliance designed for AI training and inference workloads in a datacenter. [13] The CS-1 includes a single WSE primary processor with 400,000 processing cores, as well as twelve 100 Gigabit Ethernet connections to move data in and out. [28] [13] The WSE-1 has 1.2 trillion transistors, 400,000 compute cores and 18 gigabytes of memory. [12] [13] [14]
In April 2021, Cerebras announced the CS-2 AI system based on the 2nd-generation Wafer Scale Engine (WSE-2), manufactured by the 7 nm process of TSMC . [17] It is 26 inches tall and fits in one-third of a standard data center rack. [29] [17] The Cerebras WSE-2 has 850,000 cores and 2.6 trillion transistors. [29] [30] The WSE-2 expanded on-chip SRAM to 40 gigabytes, memory bandwidth to 20 petabytes per second and total fabric bandwidth to 220 petabits per second. [31] [32]
In August 2021, the company announced a system which connects multiple integrated circuits (commonly called "chips") into a neural network with many connections. [18] It enables a single system to support AI models with more than 120 trillion parameters. [33]
In June 2022, Cerebras set a record for the largest AI models ever trained on one device. [34] Cerebras said that for the first time ever, a single CS-2 system with one Cerebras wafer can train models with up to 20 billion parameters. [35] The Cerebras CS-2 system can train multibillion-parameter natural language processing (NLP) models including GPT-3XL 1.3 billion models, as well as GPT-J 6B, GPT-3 13B and GPT-NeoX 20B with reduced software complexity and infrastructure. [35] [34]
In September 2022, Cerebras announced that it can patch its chips together to create what would be the largest-ever computing cluster for AI computing. [36] A Wafer-Scale Cluster can connect up to 192 CS-2 AI systems into a cluster, while a cluster of 16 CS-2 AI systems can create a computing system with 13.6 million cores for natural language processing. [36] The key to the new Cerebras Wafer-Scale Cluster is the exclusive use of data parallelism to train, which is the preferred approach for all AI work. [37]
In November 2022, Cerebras unveiled the supercomputer, Andromeda, which combines 16 WSE-2 chips into one cluster with 13.5 million AI-optimized cores, delivering up to 1 Exaflop of AI computing horsepower, or at least one quintillion (10 to the power of 18) operations per second. [38] [39] The entire system consumes 500 kW, which was a drastically lower amount than somewhat-comparable GPU-accelerated supercomputers. [38]
In November 2022, Cerebras announced its partnership with Cirrascale Cloud Services to provide a flat-rate "pay-per-model" compute time for its Cerebras AI Model Studio. The service is said to reduce the cost—compared to the similar cloud services on the market—by half while increasing speed up to eight times faster. [40]
In July 2023, Cerebras and UAE-based G42 unveiled the world's largest network of nine interlinked supercomputers, Condor Galaxy, for AI model training. The first supercomputer, named Condor Galaxy 1 (CG-1), boasts 4 exaFLOPs of FP16 performance and 54 million cores. [41] In November 2023, the Condor Galaxy 2 (CG-2) was announced, also containing 4 exaFLOPs and 54 million cores.
In March 2024, the companies broke ground on the Condor Galaxy 3 (CG-3), which can hit 8 exaFLOPs of performance and contains 58 million AI-optimized cores. [42]
In March 2024, the company also introduced WSE-3, a 5 nm-based chip hosting 4 trillion transistors and 900,000 AI-optimized cores, the basis of the CS-3 computer. Cerebras also announced a collaboration with Dell Technologies, unveiled in June 2024, for AI compute infrastructure for generative AI. [43]
In August 2024, Cerebras unveiled its AI inference service, claiming to be the fastest in the world and, in many cases, ten to twenty times faster than systems built using the dominant technology, Nvidia's H100 "Hopper" graphics processing unit, or GPU. [44]
As of October 2024, Cerebras' performance advantage for inference is even larger when running the latest Llama 3.2 models. The jump in AI inference performance between August and October is a big one, at a factor of 3.5X, and it opens up the gap between Cerebras CS-3 systems running on premises or in clouds operated by Cerebras. [45]
Customers are reportedly using Cerebras technologies in the hyperscale, pharmaceutical, life sciences, and energy sectors, among others. [46] [47]
In 2020, GlaxoSmithKline (GSK) began using the Cerebras CS-1 AI system in their London AI hub, for neural network models to accelerate genetic and genomic research and reduce the time taken in drug discovery. [48] The GSK research team was able to increase the complexity of the encoder models they could generate, while reducing training time. [49] Other pharmaceutical industry customers include AstraZeneca, who was able to reduce training time from two weeks on a cluster of GPUs to two days using the Cerebras CS-1 system. [50] GSK and Cerebras recently co-published research in December 2021 on epigenomic language models.
Argonne National Laboratory has been using the CS-1 since 2020 in COVID-19 research and cancer tumor research based on the world's largest cancer treatment database. [51] A series of models running on the CS-1 to predict cancer drug response to tumors achieved speed-ups of many hundreds of times on the CS-1 compared to their GPU baselines. [46]
Cerebras and the National Energy Technology Laboratory (NETL) demonstrated record-breaking performance of Cerebras' CS-1 system on a scientific compute workload in November 2020. The CS-1 was 200 times faster than the Joule Supercomputer on the key workload of Computational Fluid Dynamics. [52]
The Lawrence Livermore National Lab’s Lassen supercomputer incorporated the CS-1 in both classified and non-classified areas for physics simulations. [53] The Pittsburgh Supercomputing Center (PSC) has also incorporated the CS-1 in their Neocortex supercomputer for dual HPC and AI workloads. [54] EPCC, the supercomputing center of the University of Edinburgh, has also deployed a CS-1 system for AI-based research. [55]
In August 2021, Cerebras announced a partnership with Peptilogics on the development of AI for peptide therapeutics. [56]
In March 2022, Cerebras announced that the Company deployed its CS-2 system in the Houston facilities of TotalEnergies, its first publicly disclosed customer in the energy sector. [47] Cerebras also announced that it has deployed a CS-2 system at nference, a startup that uses natural language processing to analyze massive amounts of biomedical data. The CS-2 will be used to train transformer models that are designed to process information from piles of unstructured medical data to provide fresh insights to doctors and improve patient recovery and treatment. [57]
In May 2022, Cerebras announced that the National Center for Supercomputing Applications (NCSA) has deployed the Cerebras CS-2 system in their HOLL-I supercomputer. [58] They also announced that the Leibniz Supercomputing Centre (LRZ) in Germany plans to deploy a new supercomputer featuring the CS-2 system along with the HPE Superdome Flex server. [59] The new supercomputing system is expected to be delivered to LRZ this summer. This will be the first CS-2 system deployment in Europe. [59]
In October 2022, it was announced that the U.S. National Nuclear Security Administration would sponsor a study to investigate using Cerebras' CS-2 in nuclear stockpile stewardship computing. [60] [61] The multi-year contract will be executed through Sandia National Laboratories, Lawrence Livermore National Lab, and Los Alamos National Laboratory. [60]
In November 2022, Cerebras and the National Energy Technology Laboratory (NETL) saw record-breaking performance on the scientific compute workload of forming and solving field equations. Cerebras demonstrated that its CS-2 system was as much as 470 times faster than NETL's Joule Supercomputer in field equation modeling. [62]
The 2022 Gordon Bell Special Prize Winner for HPC-Based COVID-19 Research, which honors outstanding research achievement towards the understanding of the COVID-19 pandemic through the use of high-performance computing, used Cerebras' CS-2 system to conduct this award-winning research to transform large language models to analyze COVID-19 variants. The paper was authored by a 34-person team from Argonne National Laboratory, California Institute of Technology, Harvard University, Northern Illinois University, Technical University of Munich, University of Chicago, University of Illinois Chicago, Nvidia, and Cerebras. ANL noted that using the CS-2 Wafer-Scale Engine cluster, the team was able to achieve convergence when training on the full SARS-CoV-2 genomes in less than a day. [63] [64]
Cerebras partnered with Emirati technology group G42 to deploy its AI supercomputers to create chatbots and to analyze genomic and preventive care data. In July 2023, G42 agreed to pay around $100 million to purchase the first of potentially nine supercomputers from Cerebras, each of which capable of 4 exaflops of compute. [65] [66] [67] In August 2023, Cerebras, the Mohamed bin Zayed University of Artificial Intelligence and G42 subsidiary Inception launched Jais, a large language model. [68]
Mayo Clinic announced a collaboration with Cerebras at the 2024 J.P. Morgan Healthcare Conference, offering details on the first foundation model it will develop with the enablement of Cerebras's generative AI computing capability. The solution will combine genomic data with de-identified data from patient records and medical evidence to explore the ability to predict a patient's response to treatments to manage disease and will initially be applied to rheumatoid arthritis. The model could serve as a prototype for similar solutions to support the diagnosis and treatment of other diseases.
In May 2024, Cerebras in collaboration with researchers from Sandia National Laboratories, Lawrence Livermore National Laboratory, Los Alamos National Laboratory, and the National Nuclear Security Administration, for molecular dynamics simulations in which the team simulated 800,000 atoms interacting with each other, calculating the interactions in increments of one femtosecond at a time. Each step took just microseconds to compute on the Cerebras WSE-2. Although that's still 9 orders of magnitude slower than the actual interactions, it was also 179 times as fast as the Frontier supercomputer. The achievement effectively reduced a year's worth of computation to just two days. [69]
In March 2024, Cerebras introduced the CS-3 and third-generation Wafer Scale Engine (WSE-3), which represents the latest development of their technology. It has 2x the performance of CS-2 and hosts 900,000 cores. A CS-3 cluster is capable of training an AI model like Llama2-70B in just one single day. [70] The WSE-3 was recognized by TIME Magazine as a Best Invention of 2024. [71]
A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2022, supercomputers have existed which can perform over 1018 FLOPS, so called exascale supercomputers. For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.
Blue Gene was an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with relatively low power consumption.
David A. Bader is a Distinguished Professor and Director of the Institute for Data Science at the New Jersey Institute of Technology. Previously, he served as the Chair of the Georgia Institute of Technology School of Computational Science & Engineering, where he was also a founding professor, and the executive director of High-Performance Computing at the Georgia Tech College of Computing. In 2007, he was named the first director of the Sony Toshiba IBM Center of Competence for the Cell Processor at Georgia Tech.
Wafer-scale integration (WSI) is a system of building very-large integrated circuit networks from an entire silicon wafer to produce a single "super-chip". Combining large size and reduced packaging, WSI was expected to lead to dramatically reduced costs for some systems, notably massively parallel supercomputers but is now being employed for deep learning. The name is taken from the term very-large-scale integration, the state of the art when WSI was being developed.
The transistor count is the number of transistors in an electronic device. It is the most common measure of integrated circuit complexity. The rate at which MOS transistor counts have increased generally follows Moore's law, which observes that transistor count doubles approximately every two years. However, being directly proportional to the area of a die, transistor count does not represent how advanced the corresponding manufacturing technology is. A better indication of this is transistor density which is the ratio of a semiconductor's transistor count to its die area.
The Pittsburgh Supercomputing Center (PSC) is a high performance computing and networking center founded in 1986 and one of the original five NSF Supercomputing Centers. PSC is a joint effort of Carnegie Mellon University and the University of Pittsburgh in Pittsburgh, Pennsylvania, United States.
The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL benchmarks, a portable implementation of the high-performance LINPACK benchmark written in Fortran for distributed-memory computers.
Blue Waters was a petascale supercomputer operated by the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign. On August 8, 2007, the National Science Board approved a resolution which authorized the National Science Foundation to fund "the acquisition and deployment of the world's most powerful leadership-class supercomputer." The NSF awarded $208 million for the Blue Waters project.
Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores. Manycore processors are used extensively in embedded computers and high-performance computing.
This list compares various amounts of computing power in instructions per second organized by order of magnitude in FLOPS.
Zero ASIC Corporation, formerly Adapteva, Inc., is a fabless semiconductor company focusing on low power many core microprocessor design. The company was the second company to announce a design with 1,000 specialized processing cores on a single integrated circuit.
Japan operates a number of centers for supercomputing which hold world records in speed, with the K computer being the world's fastest from June 2011 to June 2012, and Fugaku holding the lead from June 2020 until June 2022.
Several centers for supercomputing exist across Europe, and distributed access to them is coordinated by European initiatives to facilitate high-performance computing. One such initiative, the HPC Europa project, fits within the Distributed European Infrastructure for Supercomputing Applications (DEISA), which was formed in 2002 as a consortium of eleven supercomputing centers from seven European countries. Operating within the CORDIS framework, HPC Europa aims to provide access to supercomputers across Europe.
An AI accelerator, deep learning processor or neural processing unit (NPU) is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and computer vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024, a typical AI integrated circuit chip contains tens of billions of MOSFETs.
Graphcore Limited is a British semiconductor company that develops accelerators for AI and machine learning. It has introduced a massively parallel Intelligence Processing Unit (IPU) that holds the complete machine learning model inside the processor.
The A64FX is a 64-bit ARM architecture microprocessor designed by Fujitsu. The processor is replacing the SPARC64 V as Fujitsu's processor for supercomputer applications. It powers the Fugaku supercomputer, ranked in the TOP500 as the fastest supercomputer in the world from June 2020, until falling to second place behind Frontier in June 2022.
Group 42 Holding Ltd, doing business as G42, is an Emirati artificial intelligence (AI) development holding company based in Abu Dhabi, founded in 2018. The organization is focused on AI development across various industries including government, healthcare, finance, oil and gas, aviation, and hospitality. Tahnoun bin Zayed Al Nahyan, UAEs national security advisor is the controlling shareholder and chairs the company. Because G42 had strong ties to China, U.S. authorities have been concerned that G42 serves as a channel through which sophisticated U.S. technology is diverted to Chinese companies or the government. As of February 2024, G42 divested its stakes in China.
JUWELS is a supercomputer developed by Atos and hosted by the Jülich Supercomputing Centre (JSC) of the Forschungszentrum Jülich.
Selene is a supercomputer developed by Nvidia, capable of achieving 63.460 petaflops, ranking as the fifth fastest supercomputer in the world, when it entered the list. Selene is based on the Nvidia DGX system consisting of AMD CPUs, Nvidia A100 GPUs, and Mellanox HDDR networking. Selene is based on the Nvidia DGX Superpod, which is a high performance turnkey supercomputer solution provided by Nvidia using DGX hardware. DGX Superpod is a tightly integrated system that combines high performance DGX compute nodes with fast storage and high bandwidth networking. It aims to provide a turnkey solution to high-demand machine learning workloads. Selene was built in three months and is the fastest industrial system in the US while being the second-most energy-efficient supercomputing system ever.
Tesla Dojo is a supercomputer designed and built by Tesla for computer vision video processing and recognition. It is used for training Tesla's machine learning models to improve its Full Self-Driving (FSD) advanced driver-assistance system. According to Tesla, it went into production in July 2023.
{{cite web}}
: CS1 maint: numeric names: authors list (link)