Fujitsu A64FX

Last updated

A64FX
General information
Launched2019
Marketed byFujitsu
Designed by Fujitsu
Common manufacturer
Architecture and classification
Technology node 7 nm
Microarchitecture In-house
Instruction set ARMv8.2-A with SVE and SBSA level 3
Physical specifications
Cores
  • 48 per CPU [1] plus optional assistant cores [2] [3]
History
Predecessor SPARC64 V

The A64FX is a 64-bit ARM architecture microprocessor designed by Fujitsu. [1] [4] The processor is replacing the SPARC64 V as Fujitsu's processor for supercomputer applications. [5] It powers the Fugaku supercomputer, ranked in the TOP500 as the fastest supercomputer in the world from June 2020, until falling to second place behind Frontier in June 2022. [6] [4] [5] [7]

Contents

Design

Fujitsu collaborated with ARM to develop the processor; it is the first processor to use the ARMv8.2-A Scalable Vector Extension SIMD instruction set with 512-bit vector implementation. [4]

It has "Four-operand FMA with Prefix Instruction", [1] i.e. MOVPRFX instruction followed by 3-operand FMA operation (ARM, like RISC in general, is a 3-operand machine, with no space for four operands), which get packed into a single operation in the pipeline. For the processor the designer claim ">90% execution efficiency in (D|S|H)GEMM and INT16/8 dot product". [1]

The processor uses 32 gigabytes of HBM2 memory with a bandwidth of 1 TB per second. [4] The processor contains 16 PCI Express generation 3 lanes [1] to connect to accelerators (hypothetical e.g. GPUs and FPGAs). The processor also integrates a TofuD fabric controller with 10 ports implemented as 20 lanes of high-speed 28 Gbit/s to connect multiple nodes in a cluster. [1] The reported transistor count is about 8.8 billion. [4]

Each A64FX processor has four NUMA nodes, with each NUMA node having 12 compute cores, for a total of 48 cores per processor. [8] [2] [3] Each NUMA node has its own level 2 cache, HBM2 memory, and assistant cores for non-computational purposes. [8]

Fujitsu intends to produce lower specification machines with reduced assistant cores. [2] [3] Reliability, availability and serviceability (RAS) capabilities are claimed, i.e. ~128,400 error checkers in total.

In June 2020 the Fugaku supercomputer using this processor reached 442 petaFLOPS and became the fastest supercomputer in the world.

Implementations

Fujitsu designed the A64FX for the Fugaku. As of June and November 2020, the Fugaku is the fastest supercomputer in the world by TOP500 rankings. [9] Fujitsu intends to sell smaller machines with A64FX processors. [2] [3] Anandtech reported in June 2020 that the cost of a PRIMEHPC FX700 server, with two A64FX nodes, was ¥4,155,330 (c. US$39,000). [10]

Cray is developing supercomputers using the A64FX. [11] [12] The Isambard 2 supercomputer is being built for a consortium in the United Kingdom, led by the University of Bristol and also including the Met Office, using the Fujitsu processors. [13] [14] It is an upgrade to the Isambard supercomputer which was built with the Marvell ThunderX2, another ARM architecture microprocessor. [14]

Ookami is an open testbed system supported by NSF run by Stony Brook University and the University at Buffalo providing researchers access to A64FX processors.

See also

Related Research Articles

<span class="mw-page-title-main">SPARC</span> RISC instruction set architecture

SPARC is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system developed in the early 1980s. First developed in 1986 and released in 1987, SPARC was one of the most successful early commercial RISC systems, and its success led to the introduction of similar RISC designs from many vendors through the 1980s and 1990s.

Floating point operations per second is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations.

In computing, especially digital signal processing, the multiply–accumulate (MAC) or multiply-add (MAD) operation is a common step that computes the product of two numbers and adds that product to an accumulator. The hardware unit that performs the operation is known as a multiplier–accumulator ; the operation itself is also often called a MAC or a MAD operation. The MAC operation modifies an accumulator a:

<span class="mw-page-title-main">TOP500</span> Database project devoted to the ranking of computers

The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL benchmarks, a portable implementation of the high-performance LINPACK benchmark written in Fortran for distributed-memory computers.

<span class="mw-page-title-main">Arm Holdings</span> British multinational semiconductor and software design company

Arm Holdings plc is a British semiconductor and software design company based in Cambridge, England, whose primary business is the design of central processing unit (CPU) cores that implement the ARM architecture family of instruction sets. It also designs other chips, provides software development tools under the DS-5, RealView and Keil brands, and provides systems and platforms, system-on-a-chip (SoC) infrastructure and software. As a "holding" company, it also holds shares of other companies. Since 2016, it has been majority owned by Japanese conglomerate SoftBank Group.

<span class="mw-page-title-main">SPARC64 V</span> Microprocessor designed by Fujitsu

The SPARC64 V (Zeus) is a SPARC V9 microprocessor designed by Fujitsu. The SPARC64 V was the basis for a series of successive processors designed for servers, and later, supercomputers.

<span class="mw-page-title-main">Exascale computing</span> Computer systems capable of one exaFLOPS

Exascale computing refers to computing systems capable of calculating at least 1018 IEEE 754 Double Precision (64-bit) operations (multiplications and/or additions) per second (exaFLOPS)"; it is a measure of supercomputer performance.

<span class="mw-page-title-main">K computer</span> Supercomputer in Kobe, Japan

The K computer – named for the Japanese word/numeral "kei" (京), meaning 10 quadrillion (1016) – was a supercomputer manufactured by Fujitsu, installed at the Riken Advanced Institute for Computational Science campus in Kobe, Hyōgo Prefecture, Japan. The K computer was based on a distributed memory architecture with over 80,000 compute nodes. It was used for a variety of applications, including climate research, disaster prevention and medical research. The K computer's operating system was based on the Linux kernel, with additional drivers designed to make use of the computer's hardware.

<span class="mw-page-title-main">Supercomputing in Japan</span> Overview of supercomputing in Japan

Japan operates a number of centers for supercomputing which hold world records in speed, with the K computer being the world's fastest from June 2011 to June 2012, and Fugaku holding the lead from June 2020 until June 2022.

<span class="mw-page-title-main">Supercomputing in Europe</span> Overview of supercomputing in Europe

Several centers for supercomputing exist across Europe, and distributed access to them is coordinated by European initiatives to facilitate high-performance computing. One such initiative, the HPC Europa project, fits within the Distributed European Infrastructure for Supercomputing Applications (DEISA), which was formed in 2002 as a consortium of eleven supercomputing centers from seven European countries. Operating within the CORDIS framework, HPC Europa aims to provide access to supercomputers across Europe.

The PRIMEHPC FX10 is a supercomputer designed and manufactured by Fujitsu. Announced on 7 November 2011 at the Supercomputing Conference, the PRIMEHPC FX10 is an improved and commercialized version of the K computer, which was the first supercomputer to obtain more than 10 PFLOPS on the LINPACK benchmark. In its largest configuration, the PRIMEHPC FX10 has a peak performance 23.2 PFLOPS, power consumption of 22.4 MW, and a list price of US$655.4 million. It was succeeded by the PRIMEHPC FX100 with SPARC64 XIfx processors in 2015.

The Graph500 is a rating of supercomputer systems, focused on data-intensive loads. The project was announced on International Supercomputing Conference in June 2010. The first list was published at the ACM/IEEE Supercomputing Conference in November 2010. New versions of the list are published twice a year. The main performance metric used to rank the supercomputers is GTEPS.

<span class="mw-page-title-main">HiSilicon</span> Chinese fabless semiconductor manufacturing company, fully owned by Huawei

HiSilicon is a Chinese fabless semiconductor company based in Shenzhen, Guangdong province and wholly owned by Huawei. HiSilicon purchases licenses for CPU designs from ARM Holdings, including the ARM Cortex-A9 MPCore, ARM Cortex-M3, ARM Cortex-A7 MPCore, ARM Cortex-A15 MPCore, ARM Cortex-A53, ARM Cortex-A57 and also for their Mali graphics cores. HiSilicon has also purchased licenses from Vivante Corporation for their GC4000 graphics core.

<span class="mw-page-title-main">AArch64</span> 64-bit extension of the ARM architecture

AArch64 or ARM64 is the 64-bit Execution state of the ARM architecture family. It was first introduced with the Armv8-A architecture, and has had many extension updates.

This is a comparison of ARM instruction set architecture application processor cores designed by ARM Holdings and 3rd parties. It does not include ARM Cortex-R, ARM Cortex-M, or legacy ARM cores.

Torus fusion (tofu) is a proprietary computer network topology for supercomputers developed by Fujitsu. It is a variant of the torus interconnect. The system has been used in the K computer and the Fugaku supercomputer.

<span class="mw-page-title-main">Fugaku (supercomputer)</span> Japanese supercomputer

Fugaku(Japanese: 富岳) is a petascale supercomputer at the Riken Center for Computational Science in Kobe, Japan. It started development in 2014 as the successor to the K computer and made its debut in 2020. It is named after an alternative name for Mount Fuji.

<span class="mw-page-title-main">Aurora (supercomputer)</span> US DOE supercomputer by Intel and Cray

Aurora is an exascale supercomputer that was sponsored by the United States Department of Energy (DOE) and designed by Intel and Cray for the Argonne National Laboratory. It was briefly the second fastest supercomputer in the world from November 2023 to June 2024.

AWS Graviton is a family of 64-bit ARM-based CPUs designed by the Amazon Web Services (AWS) subsidiary Annapurna Labs. The processor family is distinguished by its lower energy use relative to x86-64, static clock rates, and lack of simultaneous multithreading. It was designed to be tightly integrated with AWS servers and datacenters, and is not sold outside Amazon.

Deucalion is a supercomputer located at the Minho Advanced Computing Center (MAAC) in Guimarães, Portugal. It was inaugurated in September 2023 and is co-funded by the EuroHPC Joint Undertaking and Portugal's Foundation for Science and Technology. It is currently the fastest supercomputer in Portugal and is ranked 257th in the TOP500 global list of supercomputers.

References

  1. 1 2 3 4 5 6 "Hot Chips 30 conference; Fujitsu briefing" (PDF). Toshio Yoshida. Archived from the original (PDF) on 5 December 2020.
  2. 1 2 3 4 "Fujitsu Launches New PRIMEHPC Supercomputers Using Fugaku Technology - Fujitsu Global". www.fujitsu.com. 13 November 2019. Retrieved 28 June 2020.
  3. 1 2 3 4 "FUJITSU Supercomputer PRIMEHPC Specifications". www.fujitsu.com. Retrieved 28 June 2020.
  4. 1 2 3 4 5 "Fujitsu Successfully Triples the Power Output of Gallium-Nitride Transistors - Fujitsu Global". www.fujitsu.com. Fujitsu. Retrieved 8 March 2020.
  5. 1 2 Morgan, Timothy Prickett (24 August 2018). "Fujitsu's A64FX Arm Chip Waves The HPC Banner High". The Next Platform. Retrieved 8 March 2020.>
  6. "June 2022 | TOP500". www.top500.org. Retrieved 23 June 2023.
  7. "Outline of the Development of the Supercomputer Fugaku | RIKEN Center for Computational Science RIKEN Website". www.r-ccs.riken.jp. Archived from the original on 23 January 2021. Retrieved 18 November 2020.
  8. 1 2 Odajima, Tetsuya; Kodama, Yuetsu; Tsuji, Miwako; Matsuda, Motohiko; Maruyama, Yutaka; Sato, Mitsuhisa (September 2020). "Preliminary Performance Evaluation of the Fujitsu A64FX Using HPC Applications". 2020 IEEE International Conference on Cluster Computing (CLUSTER). pp. 523–530. doi:10.1109/CLUSTER49012.2020.00075. ISBN   978-1-7281-6677-3. S2CID   226266547.
  9. "Supercomputer Fugaku - Supercomputer Fugaku, A64FX 48C 2.2GHz, Tofu interconnect D | TOP500". www.top500.org. Retrieved 18 November 2020.
  10. Cutress, Dr Ian (26 June 2020). "HPC Systems Special Offer: Two A64FX Nodes in a 2U for $40k". www.anandtech.com. Retrieved 28 June 2020.
  11. "Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020". HPCwire. 13 November 2019. Retrieved 8 March 2020.
  12. Tsukimori, Osamu (7 January 2021). "Japan's Fugaku supercomputer is tackling some of the world's biggest problems". The Japan Times. Retrieved 26 January 2021.
  13. Bristol, University of. "February: GW4 Isambard - News and features - University of Bristol". www.bristol.ac.uk. Retrieved 8 March 2020.
  14. 1 2 Burt, Jeffrey (9 March 2020). "Isambard 2 Is About Driving Technology Diversity". The Next Platform. Retrieved 9 March 2020.