AArch64

Last updated

ARM AArch64 (64/32-bit)
AArch64 logo.svg
Introduced2011;14 years ago (2011)
VersionARMv8-R, ARMv8-A, ARMv8.1-A, ARMv8.2-A, ARMv8.3-A, ARMv8.4-A, ARMv8.5-A, ARMv8.6-A, ARMv8.7-A, ARMv8.8-A, ARMv8.9-A, ARMv9.0-A, ARMv9.1-A, ARMv9.2-A, ARMv9.3-A, ARMv9.4-A, ARMv9.5-A, ARMv9.6-A
Encoding AArch64/A64 and AArch32/A32 use 32-bit instructions, AArch32/T32 (Thumb-2) uses mixed 16- and 32-bit instructions [1]
Endianness Bi (little as default)
Extensions SVE, SVE2, SME, AES, SM3, SM4, SHA, CRC32, RNDR, TME; All mandatory: Thumb-2, Neon, VFPv4-D16, VFPv4; obsolete: Jazelle
Registers
General-purpose 31 × 64-bit integer registers [1]
Floating point 32 × 128-bit registers [1] for scalar 32- and 64-bit FP or SIMD FP or integer; or cryptography

AArch64 or ARM64 is the 64-bit Execution state of the ARM architecture family. It was first introduced with the Armv8-A architecture, and has had many extension updates. [2]

Contents

AArch64 Execution state

Naming conventions

AArch64 features

Extension: Data gathering hint (ARMv8.0-DGH).

AArch64 was introduced in ARMv8-A and is included in subsequent versions of ARMv8-A, and in all versions of ARMv9-A. It was also introduced in ARMv8-R as an option, after its introduction in ARMv8-A; it is not included in ARMv8-M.

A64 instruction formats

The main opcode for selecting which group an A64 instruction belongs to is at bits 25–28.

A64 instruction formats
TypeBit
313029282726252423222120191817161514131211109876543210
Reserved0op00000op1
SME1op00000Varies
Unallocated0001
SVE0010Varies
Unallocated0011
Data Processing Immediate PC-rel.opimmlo10000immhiRd
Data Processing Immediate Otherssf10001–11Rd
Branches + System Instructionsop0101op1op2
Load and Store Instructionsop01op10op2op3op4
Data Processing Registersfop0op1101op2op3
Data Processing Floating Point and SIMDop0111op1op2op3

ARM-A (application architecture)

Armv8-A platform with Cortex-A57/A53 MPCore big.LITTLE CPU chip ARMCortexA57A53.jpg
Armv8-A platform with Cortex-A57/A53 MPCore big.LITTLE CPU chip

Announced in October 2011, [4] ARMv8-A represents a fundamental change to the ARM architecture. It adds an optional 64-bit Execution state, named "AArch64", and the associated new "A64" instruction set, in addition to a 32-bit Execution state, "AArch32", supporting the 32-bit "A32" (original 32-bit Arm) and "T32" (Thumb/Thumb-2) instruction sets. The latter instruction sets provide user-space compatibility with the existing 32-bit ARMv7-A architecture. ARMv8-A allows 32-bit applications to be executed in a 64-bit OS, and a 32-bit OS to be under the control of a 64-bit hypervisor. [1] ARM announced their Cortex-A53 and Cortex-A57 cores on 30 October 2012. [5] Apple was the first to release an ARMv8-A compatible core (Cyclone) in a consumer product (iPhone 5S). AppliedMicro, using an FPGA, was the first to demo ARMv8-A. [6] The first ARMv8-A SoC from Samsung is the Exynos 5433 used in the Galaxy Note 4, which features two clusters of four Cortex-A57 and Cortex-A53 cores in a big.LITTLE configuration; but it will run only in AArch32 mode. [7] ARMv8-A includes the VFPv3/v4 and advanced SIMD (Neon) as standard features in both AArch32 and AArch64. It also adds cryptography instructions supporting AES, SHA-1/SHA-256 and finite field arithmetic. [8]

An ARMv8-A processor can support one or both of AArch32 and AArch64; it may support AArch32 and AArch64 at lower Exception levels and only AArch64 at higher Exception levels. [9] For example, the ARM Cortex-A32 supports only AArch32, [10] the ARM Cortex-A34 supports only AArch64, [11] and the ARM Cortex-A72 supports both AArch64 and AArch32. [12] An ARMv9-A processor must support AArch64 at all Exception levels, and may support AArch32 at EL0. [9]

ARMv8.1-A

In December 2014, ARMv8.1-A, [13] an update with "incremental benefits over v8.0", was announced. The enhancements fell into two categories: changes to the instruction set, and changes to the exception model and memory translation.

Instruction set enhancements included the following:

Enhancements for the exception model and memory translation system included the following:

ARMv8.2-A

In January 2016, ARMv8.2-A was announced. [15] Its enhancements fell into four categories:

Scalable Vector Extension (SVE)

The Scalable Vector Extension (SVE) is "an optional extension to the ARMv8.2-A architecture and newer" developed specifically for vectorization of high-performance computing scientific workloads. [16] [17] The specification allows for variable vector lengths to be implemented from 128 to 2048 bits. The extension is complementary to, and does not replace, the NEON extensions.

A 512-bit SVE variant has already been implemented on the Fugaku supercomputer using the Fujitsu A64FX ARM processor; this computer [18] was the fastest supercomputer in the world for two years, from June 2020 [19] to May 2022. [20] A more flexible version, 2x256 SVE, was implemented by the AWS Graviton3 ARM processor.

SVE is supported by the GCC compiler, with GCC 8 supporting automatic vectorization [17] and GCC 10 supporting C intrinsics. As of July 2020, LLVM and clang support C and IR intrinsics. ARM's own fork of LLVM supports auto-vectorization. [21]

ARMv8.3-A

In October 2016, ARMv8.3-A was announced. Its enhancements fell into six categories: [22]

ARMv8.3-A architecture is now supported by (at least) the GCC 7 compiler. [27]

ARMv8.4-A

In November 2017, ARMv8.4-A was announced. Its enhancements fell into these categories: [28] [29] [30]

ARMv8.5-A and ARMv9.0-A

In September 2018, ARMv8.5-A was announced. Its enhancements fell into these categories: [31] [32] [33]

On 2 August 2019, Google announced Android would adopt Memory Tagging Extension (MTE). [35]

In March 2021, ARMv9-A was announced. ARMv9-A's baseline is all the features from ARMv8.5. [36] [37] [38] ARMv9-A also adds:

ARMv8.6-A and ARMv9.1-A

In September 2019, ARMv8.6-A was announced. Its enhancements fell into these categories: [31] [43]

For example, fine-grained traps, Wait-for-Event (WFE) instructions, EnhancedPAC2 and FPAC. The bfloat16 extensions for SVE and Neon are mainly for deep learning use. [45]

ARMv8.7-A and ARMv9.2-A

In September 2020, ARMv8.7-A was announced. Its enhancements fell into these categories: [31] [46]

ARMv8.8-A and ARMv9.3-A

In September 2021, ARMv8.8-A and ARMv9.3-A were announced. Their enhancements fell into these categories: [31] [48]

LLVM 15 supports ARMv8.8-A and ARMv9.3-A. [49]

ARMv8.9-A and ARMv9.4-A

In September 2022, ARMv8.9-A and ARMv9.4-A were announced, including: [50]

ARMv9.5-A

In October 2023, ARMv9.5-A was announced, including: [51]

ARMv9.6-A

In October 2024, ARMv9.6-A was announced, including: [52]

ARM-R (real-time architecture)

The ARM-R architecture, specifically the Armv8-R profile, is designed to address the needs of real-time applications, where predictable and deterministic behavior is essential. This profile focuses on delivering high performance, reliability, and efficiency in embedded systems where real-time constraints are critical.

With the introduction of optional AArch64 support in the Armv8-R profile, the real-time capabilities have been further enhanced. The Cortex-R82 [53] is the first processor to implement this extended support, bringing several new features and improvements to the real-time domain. [54]

Key Features of Armv8-R with AArch64 Support

  1. AArch64 Instruction Set (A64):
    • The A64 instruction [26] set in the Cortex-R82 provides 64-bit data handling and operations, which improves performance for certain computational tasks and enhances overall system efficiency. [53]
    • Example Instruction: ADD X0, X1, X2 adds the values in 64-bit registers X1 and X2 and stores the result in X0. This 64-bit operation allows for larger and more complex calculations compared to the 32-bit operations of the previous A32 instruction set.
  2. Enhanced Memory Management:
    • Memory Barrier Instructions: The Cortex-R82 introduces improved memory barrier instructions to ensure proper ordering of memory operations, which is critical in real-time systems where the timing of memory operations must be strictly controlled. [55]
      • Data Synchronization Barrier (DSB): Ensures that all data accesses before the barrier are completed before continuing with subsequent operations.
      • Data Memory Barrier (DMB): Guarantees that all memory accesses before the barrier are completed before any memory accesses after the barrier can proceed.
    • Example: In a real-time automotive control system, DSB might be used to ensure that sensor data is fully written to memory before the system proceeds with processing or decision-making, preventing data corruption or inconsistencies.
  3. Improved Address Space:
    • 64-bit Addressing: AArch64 allows the Cortex-R82 to address a much larger memory space compared to its 32-bit predecessors, making it suitable for applications requiring extensive memory.
    • Example: A complex industrial automation system can utilize the expanded address space to manage large data sets and buffers more efficiently, improving system performance and capability.
  4. Real-Time Performance Enhancements:
    • Interrupt Handling: With AArch64 support, the Cortex-R82 can handle interrupts with lower latency and improved predictability, crucial for real-time operations.
    • Example: In a robotics application, the Cortex-R82's enhanced interrupt handling can ensure timely responses to external stimuli, such as changes in sensor data or control commands.

References

  1. 1 2 3 4 Grisenthwaite, Richard (2011). "ARMv8-A Technology Preview" (PDF). Archived from the original (PDF) on 11 November 2011. Retrieved 31 October 2011.
  2. "Overview". Learn the architecture: Understanding the Armv8.x and Armv9.x extensions.
  3. "Cortex-A32 Processor – ARM" . Retrieved 18 December 2016.
  4. "ARM Discloses Technical Details Of The Next Version Of The ARM Architecture" (Press release). Arm Holdings. 27 October 2011. Archived from the original on 1 January 2019. Retrieved 20 September 2013.
  5. "ARM Launches Cortex-A50 Series, the World's Most Energy-Efficient 64-bit Processors" (Press release). Arm Holdings . Retrieved 31 October 2012.
  6. "AppliedMicro Showcases World's First 64-bit ARM v8 Core" (Press release). AppliedMicro. 28 October 2011. Retrieved 11 February 2014.
  7. "Samsung's Exynos 5433 is an A57/A53 ARM SoC". AnandTech. Retrieved 17 September 2014.
  8. "ARM Cortex-A53 MPCore Processor Technical Reference Manual: Cryptography Extension". ARM. Retrieved 11 September 2016.
  9. 1 2 "Impact of implemented Exception levels". Learn the architecture - AArch64 Exception Model. Arm.
  10. "Cortex-A32". Arm Developer.
  11. "Cortex-A34". Arm Developer.
  12. "Cortex-A72". Arm Developer.
  13. Brash, David (2 December 2014). "The ARMv8-A architecture and its ongoing development" . Retrieved 23 January 2015.
  14. "Top-byte ignore (TBI)". WikiChip.
  15. Brash, David (5 January 2016). "ARMv8-A architecture evolution" . Retrieved 7 June 2016.
  16. "The scalable vector extension sve for the ARMv8 a architecture". Arm Community. 22 August 2016. Retrieved 8 July 2018.
  17. 1 2 "GCC 8 Release Series – Changes, New Features, and Fixes – GNU Project – Free Software Foundation (FSF)". gcc.gnu.org. Retrieved 9 July 2018.
  18. "Fujitsu Completes Post-K Supercomputer CPU Prototype, Begins Functionality Trials – Fujitsu Global". www.fujitsu.com (Press release). Retrieved 8 July 2018.
  19. "Japan's Fugaku gains title as world's fastest supercomputer" (Press release). www.riken.jp. 23 June 2020. Retrieved 7 December 2020.
  20. "ORNL's Frontier First to Break the Exaflop Ceiling". Top500 . 30 May 2022. Retrieved 30 May 2022.
  21. "⚙ D71712 Downstream SVE/SVE2 implementation (LLVM)". reviews.llvm.org.
  22. David Brash (26 October 2016). "ARMv8-A architecture – 2016 additions".
  23. "[Ping~,AArch64] Add commandline support for -march=armv8.3-a". pointer authentication extension is defined to be mandatory extension on ARMv8.3-A and is not optional
  24. "Pointer Authentication on Arm". ARM. Retrieved 5 March 2025.
  25. "Qualcomm releases whitepaper detailing pointer authentication on ARMv8.3". 10 January 2017.
  26. 1 2 "A64 Floating-point Instructions: FJCVTZS". arm.com. Retrieved 11 July 2019.
  27. "GCC 7 Release Series – Changes, New Features, and Fixes". The ARMv8.3-A architecture is now supported. It can be used by specifying the -march=armv8.3-a option. [..] The option -msign-return-address= is supported to enable return address protection using ARMv8.3-A Pointer Authentication Extensions.
  28. "Introducing 2017's extensions to the Arm Architecture". community.arm.com. 2 November 2017. Retrieved 15 June 2019.
  29. "Exploring dot product machine learning". community.arm.com. 6 December 2017. Retrieved 15 June 2019.
  30. "ARM Preps ARMv8.4-A Support For GCC Compiler – Phoronix". www.phoronix.com. Retrieved 14 January 2018.
  31. 1 2 3 4 "ARMv8.x and ARMv9.x extensions and features". Learn the architecture: Understanding the ARMv8.x and ARMv9.x extensions.
  32. "Arm Architecture ARMv8.5-A Announcement – Processors blog – Processors – Arm Community". community.arm.com. Retrieved 26 April 2019.
  33. "Arm Architecture Reference Manual ARMv8, for ARMv8-A architecture profile". ARM Developer. Retrieved 6 August 2019.
  34. "Arm MTE architecture: Enhancing memory safety". community.arm.com. 5 August 2019. Retrieved 27 July 2021.
  35. "Adopting the Arm Memory Tagging Extension in Android". Google Online Security Blog. Retrieved 6 August 2019.
  36. "Arm's solution to the future needs of AI, security and specialized computing is v9". Arm | The Architecture for the Digital World. Retrieved 27 July 2021.
  37. Schor, David (30 March 2021). "Arm Launches ARMv9". WikiChip Fuse. Retrieved 27 July 2021.
  38. Frumusanu, Andrei. "Arm Announces ARMv9 Architecture: SVE2, Security, and the Next Decade". www.anandtech.com. Retrieved 27 July 2021.
  39. 1 2 3 "Arm releases SVE2 and TME for A-profile architecture – Processors blog – Processors – Arm Community". community.arm.com. 18 April 2019. Retrieved 25 May 2019.
  40. 1 2 "Arm SVE2 Support Aligning For GCC 10, LLVM Clang 9.0 – Phoronix". www.phoronix.com. Retrieved 26 May 2019.
  41. "Unlocking the power of data with Arm CCA". community.arm.com. 23 June 2021. Retrieved 27 July 2021.
  42. "Arm Introduces Its Confidential Compute Architecture". WikiChip Fuse. 23 June 2021. Retrieved 27 July 2021.
  43. "Arm A profile architecture update 2019". community.arm.com. 25 September 2019. Retrieved 26 September 2019.
  44. "LLVM 11.0.0 Release Notes". releases.llvm.org. Retrieved 11 March 2021.
  45. "BFloat16 extensions for ARMv8-A". community.arm.com. 29 August 2019. Retrieved 30 August 2019.
  46. Weidmann, Martin (21 September 2020). "Arm A-Profile Architecture Developments 2020". community.arm.com. ARM. Retrieved 28 September 2022.
  47. "Scalable Matrix Extension for the ARMv9-A Architecture". community.arm.com. 14 July 2021. Retrieved 27 July 2021.
  48. Weidmann, Martin (8 September 2021). "Arm A-Profile Architecture Developments 2021". community.arm.com. ARM. Retrieved 28 September 2022.
  49. "What is New in LLVM 15? - Architectures and Processors blog - Arm Community blogs - Arm Community". 27 February 2023. Retrieved 15 April 2023.
  50. "Arm A-Profile Architecture Developments 2022 - Architectures and Processors blog - Arm Community blogs - Arm Community". community.arm.com. 29 September 2022. Retrieved 9 December 2022.
  51. "Arm A-Profile Architecture Developments 2023 - Architectures and Processors blog - Arm Community blogs - Arm Community". community.arm.com. 5 October 2023. Retrieved 14 October 2024.
  52. "Arm A-Profile Architecture Developments 2024 - Architectures and Processors blog - Arm Community blogs - Arm Community". community.arm.com. 1 October 2024. Retrieved 14 October 2024.
  53. 1 2 Frumusanu, Andrei (3 September 2020). "ARM Announced Cortex-R82: First 64-bit Real Time Processor". AnandTech .
  54. "Arm Architecture Reference Manual Supplement - Armv8, for Armv8-R AArch64 architecture profile". Arm Ltd.
  55. "Cortex-R82 Technical Reference Manual".