Fujitsu A64FX

Last updated

A64FX
General information
Launched2019
Marketed byFujitsu
Designed by Fujitsu
Common manufacturer(s)
Architecture and classification
Technology node 7 nm
Microarchitecture In-house
Instruction set ARMv8.2-A with SVE and SBSA level 3
Physical specifications
Cores
  • 48 per CPU [1] plus optional assistant cores [2] [3]
History
Predecessor(s) SPARC64 V

The A64FX is a 64-bit ARM architecture microprocessor designed by Fujitsu. [1] [4] The processor is replacing the SPARC64 V as Fujitsu's processor for supercomputer applications. [5] It powers the Fugaku supercomputer, ranked in the TOP500 as the fastest supercomputer in the world from June 2020, until falling to second place behind Frontier in June 2022. [6] [4] [5] [7]

Contents

Design

Fujitsu collaborated with ARM to develop the processor; it is the first processor to use the ARMv8.2-A Scalable Vector Extension SIMD instruction set with 512-bit vector implementation. [4]

It has "Four-operand FMA with Prefix Instruction", [1] i.e. MOVPRFX instruction followed by 3-operand FMA operation (ARM, like RISC in general, is a 3-operand machine, with no space for four operands), which get packed into a single operation in the pipeline. For the processor the designer claim ">90% execution efficiency in (D|S|H)GEMM and INT16/8 dot product". [1]

The processor uses 32 gigabytes of HBM2 memory with a bandwidth of 1 TB per second. [4] The processor contains 16 PCI Express generation 3 lanes [1] to connect to accelerators (hypothetical e.g. GPUs and FPGAs). The processor also integrates a TofuD fabric controller with 10 ports implemented as 20 lanes of high-speed 28 Gbps to connect multiple nodes in a cluster. [1] The reported transistor count is about 8.8 billion. [4]

Each A64FX processor has four NUMA nodes, with each NUMA node having 12 compute cores, for a total of 48 cores per processor. [8] [2] [3] Each NUMA node has its own level 2 cache, HBM2 memory, and assistant cores for non-computational purposes. [8]

Fujitsu intends to produce lower specification machines with reduced assistant cores. [2] [3] Reliability, availability and serviceability (RAS) capabilities are claimed, i.e. ~128,400 error checkers in total.

In June 2020 the Fugaku supercomputer using this processor reached 442 petaFLOPS and became the fastest supercomputer in the world.

Implementations

Fujitsu designed the A64FX for the Fugaku. As of June and November 2020, the Fugaku is the fastest supercomputer in the world by TOP500 rankings. [9] Fujitsu intends to sell smaller machines with A64FX processors. [2] [3] Anandtech reported in June 2020 that the cost of a PRIMEHPC FX700 server, with two A64FX nodes, was ¥4,155,330 (c. US$39,000). [10]

Cray is developing supercomputers using the A64FX. [11] [12] The Isambard 2 supercomputer is being built for a consortium in the United Kingdom, led by the University of Bristol and also including the Met Office, using the Fujitsu processors. [13] [14] It is an upgrade to the Isambard supercomputer which was built with the Marvell ThunderX2, another ARM architecture microprocessor. [14]

Ookami is an open testbed system supported by NSF run by Stony Brook University and the University at Buffalo providing researchers access to A64FX processors.

See also

Related Research Articles

<span class="mw-page-title-main">SPARC</span> RISC instruction set architecture

SPARC is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system developed in the early 1980s. First developed in 1986 and released in 1987, SPARC was one of the most successful early commercial RISC systems, and its success led to the introduction of similar RISC designs from many vendors through the 1980s and 1990s.

Floating point operations per second is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations.

In computing, especially digital signal processing, the multiply–accumulate (MAC) or multiply-add (MAD) operation is a common step that computes the product of two numbers and adds that product to an accumulator. The hardware unit that performs the operation is known as a multiplier–accumulator ; the operation itself is also often called a MAC or a MAD operation. The MAC operation modifies an accumulator a:

<span class="mw-page-title-main">TOP500</span> Database project devoted to the ranking of computers

The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL benchmarks, a portable implementation of the high-performance LINPACK benchmark written in Fortran for distributed-memory computers.

The Green500 is a biannual ranking of supercomputers, from the TOP500 list of supercomputers, in terms of energy efficiency. The list measures performance per watt using the TOP500 measure of high performance LINPACK benchmarks at double-precision floating-point format.

<span class="mw-page-title-main">Arm Holdings</span> British multinational semiconductor and software design company

Arm Holdings plc is a British semiconductor and software design company based in Cambridge, England, whose primary business is the design of central processing unit (CPU) cores that implement the ARM architecture family of instruction sets. It also designs other chips, provides software development tools under the DS-5, RealView and Keil brands, and provides systems and platforms, system-on-a-chip (SoC) infrastructure and software. As a "holding" company, it also holds shares of other companies. Since 2016, it has been majority owned by Japanese conglomerate SoftBank Group.

The SPARC64 V (Zeus) is a SPARC V9 microprocessor designed by Fujitsu. The SPARC64 V was the basis for a series of successive processors designed for servers, and later, supercomputers.

Exascale computing refers to computing systems capable of calculating at least "1018 IEEE 754 Double Precision (64-bit) operations (multiplications and/or additions) per second (exaFLOPS)"; it is a measure of supercomputer performance.

<span class="mw-page-title-main">K computer</span> Supercomputer in Kobe, Japan

The K computer – named for the Japanese word/numeral "kei" (京), meaning 10 quadrillion (1016) – was a supercomputer manufactured by Fujitsu, installed at the Riken Advanced Institute for Computational Science campus in Kobe, Hyōgo Prefecture, Japan. The K computer was based on a distributed memory architecture with over 80,000 compute nodes. It was used for a variety of applications, including climate research, disaster prevention and medical research. The K computer's operating system was based on the Linux kernel, with additional drivers designed to make use of the computer's hardware.

<span class="mw-page-title-main">Supercomputing in Japan</span> Overview of supercomputing in Japan

Japan operates a number of centers for supercomputing which hold world records in speed, with the K computer being the world's fastest from June 2011 to June 2012, and Fugaku holding the lead from June 2020 until June 2022.

<span class="mw-page-title-main">Supercomputing in Europe</span> Overview of supercomputing in Europe

Several centers for supercomputing exist across Europe, and distributed access to them is coordinated by European initiatives to facilitate high-performance computing. One such initiative, the HPC Europa project, fits within the Distributed European Infrastructure for Supercomputing Applications (DEISA), which was formed in 2002 as a consortium of eleven supercomputing centers from seven European countries. Operating within the CORDIS framework, HPC Europa aims to provide access to supercomputers across Europe.

<span class="mw-page-title-main">Supercomputer architecture</span> Design of high-performance computers

Approaches to supercomputer architecture have taken dramatic turns since the earliest systems were introduced in the 1960s. Early supercomputer architectures pioneered by Seymour Cray relied on compact innovative designs and local parallelism to achieve superior computational peak performance. However, in time the demand for increased computational power ushered in the age of massively parallel systems.

The PRIMEHPC FX10 is a supercomputer designed and manufactured by Fujitsu. Announced on 7 November 2011 at the Supercomputing Conference, the PRIMEHPC FX10 is an improved and commercialized version of the K computer, which was the first supercomputer to obtain more than 10 PFLOPS on the LINPACK benchmark. In its largest configuration, the PRIMEHPC FX10 has a peak performance 23.2 PFLOPS, power consumption of 22.4 MW, and a list price of US$655.4 million. It was succeeded by the PRIMEHPC FX100 with SPARC64 XIfx processors in 2015.

<span class="mw-page-title-main">AArch64</span> 64-bit extension of the ARM architecture

AArch64 or ARM64 is the 64-bit extension of the ARM architecture family. It was first introduced with the Armv8-A architecture, and had many extension updates.

This is a comparison of ARM instruction set architecture application processor cores designed by ARM Holdings and 3rd parties. It does not include ARM Cortex-R, ARM Cortex-M, or legacy ARM cores.

Torus fusion (tofu) is a proprietary computer network topology for supercomputers developed by Fujitsu. It is a variant of the torus interconnect. The system has been used in the K computer and the Fugaku supercomputer.

The Cray XC50 is a massively parallel multiprocessor supercomputer manufactured by Cray. The machine can support Intel Xeon processors, as well as Cavium ThunderX2 processors, Xeon Phi processors and NVIDIA Tesla P100 GPUs. The processors are connected by Cray's proprietary "Aries" interconnect, in a dragonfly network topology. The XC50 is an evolution of the XC40, with the main difference being the support of Tesla P100 processors and the use of Cray software release CLE 6 or 7.

<span class="mw-page-title-main">Fugaku (supercomputer)</span> Japanese supercomputer

Fugaku(Japanese: 富岳) is a petascale supercomputer at the Riken Center for Computational Science in Kobe, Japan. It started development in 2014 as the successor to the K computer and made its debut in 2020. It is named after an alternative name for Mount Fuji.

<span class="mw-page-title-main">Aurora (supercomputer)</span> Planned supercomputer

Aurora is a supercomputer that was sponsored by the United States Department of Energy (DOE) and designed by Intel and Cray for the Argonne National Laboratory. It has been the second fastest supercomputer in the world since 2023. It is expected that after optimizing its performance it will exceed 2 ExaFLOPS, making it the fastest computer ever.

<span class="mw-page-title-main">JUWELS</span> Supercomputer in Germany

JUWELS is a supercomputer developed by Atos and hosted by the Jülich Supercomputing Centre (JSC) of the Forschungszentrum Jülich. It is capable of a theoretical peak of 70.980 petaflops and it serves as the replacement of the now out-of-operation JUQUEEN supercomputer. JUWELS Booster Module was ranked as the seventh fastest supercomputer in the world at its debut on the November 2020 TOP500 list. The JUWELS Booster Module is part of a modular system architecture and a second Xeon based JUWELS Cluster Module ranked separately as the 44th fastest supercomputer in the world on the November 2020 TOP500 list.

References

  1. 1 2 3 4 5 6 "Hot Chips 30 conference; Fujitsu briefing" (PDF). Toshio Yoshida. Archived from the original (PDF) on 5 December 2020.
  2. 1 2 3 4 "Fujitsu Launches New PRIMEHPC Supercomputers Using Fugaku Technology - Fujitsu Global". www.fujitsu.com. 13 November 2019. Retrieved 28 June 2020.
  3. 1 2 3 4 "FUJITSU Supercomputer PRIMEHPC Specifications". www.fujitsu.com. Retrieved 28 June 2020.
  4. 1 2 3 4 5 "Fujitsu Successfully Triples the Power Output of Gallium-Nitride Transistors - Fujitsu Global". www.fujitsu.com. Fujitsu. Retrieved 8 March 2020.
  5. 1 2 Morgan, Timothy Prickett (24 August 2018). "Fujitsu's A64FX Arm Chip Waves The HPC Banner High". The Next Platform. Retrieved 8 March 2020.>
  6. "June 2022 | TOP500". www.top500.org. Retrieved 23 June 2023.
  7. "Outline of the Development of the Supercomputer Fugaku | RIKEN Center for Computational Science RIKEN Website". www.r-ccs.riken.jp. Archived from the original on 23 January 2021. Retrieved 18 November 2020.
  8. 1 2 Odajima, Tetsuya; Kodama, Yuetsu; Tsuji, Miwako; Matsuda, Motohiko; Maruyama, Yutaka; Sato, Mitsuhisa (September 2020). "Preliminary Performance Evaluation of the Fujitsu A64FX Using HPC Applications". 2020 IEEE International Conference on Cluster Computing (CLUSTER). pp. 523–530. doi:10.1109/CLUSTER49012.2020.00075. ISBN   978-1-7281-6677-3. S2CID   226266547.
  9. "Supercomputer Fugaku - Supercomputer Fugaku, A64FX 48C 2.2GHz, Tofu interconnect D | TOP500". www.top500.org. Retrieved 18 November 2020.
  10. Cutress, Dr Ian (26 June 2020). "HPC Systems Special Offer: Two A64FX Nodes in a 2U for $40k". www.anandtech.com. Retrieved 28 June 2020.
  11. "Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020". HPCwire. 13 November 2019. Retrieved 8 March 2020.
  12. Tsukimori, Osamu (7 January 2021). "Japan's Fugaku supercomputer is tackling some of the world's biggest problems". The Japan Times. Retrieved 26 January 2021.
  13. Bristol, University of. "February: GW4 Isambard - News and features - University of Bristol". www.bristol.ac.uk. Retrieved 8 March 2020.
  14. 1 2 Burt, Jeffrey (9 March 2020). "Isambard 2 Is About Driving Technology Diversity". The Next Platform. Retrieved 9 March 2020.