General information | |
---|---|
Launched | 2019 |
Marketed by | Fujitsu |
Designed by | Fujitsu |
Common manufacturer | |
Architecture and classification | |
Technology node | 7 nm |
Microarchitecture | In-house |
Instruction set | ARMv8.2-A with SVE and SBSA level 3 |
Physical specifications | |
Cores | |
History | |
Predecessor | SPARC64 V |
The A64FX is a 64-bit ARM architecture microprocessor designed by Fujitsu. [1] [4] The processor is replacing the SPARC64 V as Fujitsu's processor for supercomputer applications. [5] It powers the Fugaku supercomputer, ranked in the TOP500 as the fastest supercomputer in the world from June 2020, until falling to second place behind Frontier in June 2022. [6] [4] [5] [7]
Fujitsu collaborated with ARM to develop the processor; it is the first processor to use the ARMv8.2-A Scalable Vector Extension SIMD instruction set with 512-bit vector implementation. [4]
It has "Four-operand FMA with Prefix Instruction", [1] i.e. MOVPRFX instruction followed by 3-operand FMA operation (ARM, like RISC in general, is a 3-operand machine, with no space for four operands), which get packed into a single operation in the pipeline. For the processor the designer claim ">90% execution efficiency in (D|S|H)GEMM and INT16/8 dot product". [1]
The processor uses 32 gigabytes of HBM2 memory with a bandwidth of 1 TB per second. [4] The processor contains 16 PCI Express generation 3 lanes [1] to connect to accelerators (hypothetical e.g. GPUs and FPGAs). The processor also integrates a TofuD fabric controller with 10 ports implemented as 20 lanes of high-speed 28 Gbps to connect multiple nodes in a cluster. [1] The reported transistor count is about 8.8 billion. [4]
Each A64FX processor has four NUMA nodes, with each NUMA node having 12 compute cores, for a total of 48 cores per processor. [8] [2] [3] Each NUMA node has its own level 2 cache, HBM2 memory, and assistant cores for non-computational purposes. [8]
Fujitsu intends to produce lower specification machines with reduced assistant cores. [2] [3] Reliability, availability and serviceability (RAS) capabilities are claimed, i.e. ~128,400 error checkers in total.
In June 2020 the Fugaku supercomputer using this processor reached 442 petaFLOPS and became the fastest supercomputer in the world.
Fujitsu designed the A64FX for the Fugaku. As of June and November 2020, the Fugaku is the fastest supercomputer in the world by TOP500 rankings. [9] Fujitsu intends to sell smaller machines with A64FX processors. [2] [3] Anandtech reported in June 2020 that the cost of a PRIMEHPC FX700 server, with two A64FX nodes, was ¥4,155,330 (c. US$39,000). [10]
Cray is developing supercomputers using the A64FX. [11] [12] The Isambard 2 supercomputer is being built for a consortium in the United Kingdom, led by the University of Bristol and also including the Met Office, using the Fujitsu processors. [13] [14] It is an upgrade to the Isambard supercomputer which was built with the Marvell ThunderX2, another ARM architecture microprocessor. [14]
Ookami is an open testbed system supported by NSF run by Stony Brook University and the University at Buffalo providing researchers access to A64FX processors.
SPARC is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system developed in the early 1980s. First developed in 1986 and released in 1987, SPARC was one of the most successful early commercial RISC systems, and its success led to the introduction of similar RISC designs from many vendors through the 1980s and 1990s.
Floating point operations per second is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations.
In computing, especially digital signal processing, the multiply–accumulate (MAC) or multiply-add (MAD) operation is a common step that computes the product of two numbers and adds that product to an accumulator. The hardware unit that performs the operation is known as a multiplier–accumulator ; the operation itself is also often called a MAC or a MAD operation. The MAC operation modifies an accumulator a:
The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL benchmarks, a portable implementation of the high-performance LINPACK benchmark written in Fortran for distributed-memory computers.
Arm Holdings plc is a British semiconductor and software design company based in Cambridge, England, whose primary business is the design of central processing unit (CPU) cores that implement the ARM architecture family of instruction sets. It also designs other chips, provides software development tools under the DS-5, RealView and Keil brands, and provides systems and platforms, system-on-a-chip (SoC) infrastructure and software. As a "holding" company, it also holds shares of other companies. Since 2016, it has been majority owned by Japanese conglomerate SoftBank Group.
The SPARC64 V (Zeus) is a SPARC V9 microprocessor designed by Fujitsu. The SPARC64 V was the basis for a series of successive processors designed for servers, and later, supercomputers.
Exascale computing refers to computing systems capable of calculating at least "1018 IEEE 754 Double Precision (64-bit) operations (multiplications and/or additions) per second (exaFLOPS)"; it is a measure of supercomputer performance.
The K computer – named for the Japanese word/numeral "kei" (京), meaning 10 quadrillion (1016) – was a supercomputer manufactured by Fujitsu, installed at the Riken Advanced Institute for Computational Science campus in Kobe, Hyōgo Prefecture, Japan. The K computer was based on a distributed memory architecture with over 80,000 compute nodes. It was used for a variety of applications, including climate research, disaster prevention and medical research. The K computer's operating system was based on the Linux kernel, with additional drivers designed to make use of the computer's hardware.
Japan operates a number of centers for supercomputing which hold world records in speed, with the K computer being the world's fastest from June 2011 to June 2012, and Fugaku holding the lead from June 2020 until June 2022.
Several centers for supercomputing exist across Europe, and distributed access to them is coordinated by European initiatives to facilitate high-performance computing. One such initiative, the HPC Europa project, fits within the Distributed European Infrastructure for Supercomputing Applications (DEISA), which was formed in 2002 as a consortium of eleven supercomputing centers from seven European countries. Operating within the CORDIS framework, HPC Europa aims to provide access to supercomputers across Europe.
The PRIMEHPC FX10 is a supercomputer designed and manufactured by Fujitsu. Announced on 7 November 2011 at the Supercomputing Conference, the PRIMEHPC FX10 is an improved and commercialized version of the K computer, which was the first supercomputer to obtain more than 10 PFLOPS on the LINPACK benchmark. In its largest configuration, the PRIMEHPC FX10 has a peak performance 23.2 PFLOPS, power consumption of 22.4 MW, and a list price of US$655.4 million. It was succeeded by the PRIMEHPC FX100 with SPARC64 XIfx processors in 2015.
The Graph500 is a rating of supercomputer systems, focused on data-intensive loads. The project was announced on International Supercomputing Conference in June 2010. The first list was published at the ACM/IEEE Supercomputing Conference in November 2010. New versions of the list are published twice a year. The main performance metric used to rank the supercomputers is GTEPS.
AArch64 or ARM64 is the 64-bit Execution state of the ARM architecture family. It was first introduced with the Armv8-A architecture, and has had many extension updates.
This is a comparison of ARM instruction set architecture application processor cores designed by ARM Holdings and 3rd parties. It does not include ARM Cortex-R, ARM Cortex-M, or legacy ARM cores.
AMD Instinct is AMD's brand of data center GPUs. It replaced AMD's FirePro S brand in 2016. Compared to the Radeon brand of mainstream consumer/gamer products, the Instinct product line is intended to accelerate deep learning, artificial neural network, and high-performance computing/GPGPU applications.
Torus fusion (tofu) is a proprietary computer network topology for supercomputers developed by Fujitsu. It is a variant of the torus interconnect. The system has been used in the K computer and the Fugaku supercomputer.
Fugaku(Japanese: 富岳) is a petascale supercomputer at the Riken Center for Computational Science in Kobe, Japan. It started development in 2014 as the successor to the K computer and made its debut in 2020. It is named after an alternative name for Mount Fuji.
Aurora is an exascale supercomputer that was sponsored by the United States Department of Energy (DOE) and designed by Intel and Cray for the Argonne National Laboratory. It has been the second fastest supercomputer in the world since 2023. It is expected that after optimizing its performance it will exceed 2 ExaFLOPS, making it the fastest computer ever.
AWS Graviton is a family of 64-bit ARM-based CPUs designed by the Amazon Web Services (AWS) subsidiary Annapurna Labs. The processor family is distinguished by its lower energy use relative to x86-64, static clock rates, and omission of simultaneous multithreading. It was designed to be tightly integrated with AWS servers and datacenters, and is not sold outside Amazon.