Block floating point

Last updated

Block floating point (BFP) is a method used to provide an arithmetic approaching floating point while using a fixed-point processor. BFP assigns a group of significands (the non-exponent part of the floating-point number) to a single exponent, rather than single significand being assigned its own exponent. BFP can be advantageous to limit space use in hardware to perform the same functions as floating-point algorithms, by reusing the exponent; some operations over multiple values between blocks can also be done with a reduced amount of computation. [1]

Contents

The common exponent is found by data with the largest amplitude in the block. To find the value of the exponent, the number of leading zeros must be found (count leading zeros). For this to be done, the number of left shifts needed for the data must be normalized to the dynamic range of the processor used. Some processors have means to find this out themselves, such as exponent detection and normalization instructions. [2] [3]

Block floating-point algorithms were extensively studied by James Hardy Wilkinson. [4] [5] [6]

BFP can be recreated in software for smaller performance gains.

Microscaling (MX) Formats

Microscaling (MX) formats are a type of Block Floating Point (BFP) data format specifically designed for AI and machine learning workloads. The MX format, endorsed and standardized by major industry players such as AMD, Arm, Intel, Meta, Microsoft, NVIDIA, and Qualcomm, represents a significant advancement in narrow precision data formats for AI. [7] [8] [9]

The MX format uses a single shared scaling factor (exponent) for a block of elements, significantly reducing the memory footprint and computational resources required for AI operations. Each block of k elements shares this common scaling factor, which is stored separately from the individual elements.

The initial MX specification introduces several specific formats, including MXFP8, MXFP6, MXFP4, and MXINT8. These formats support various precision levels:

MX formats have been demonstrated to be effective in a variety of AI tasks, including large language models (LLMs), image classification, speech recognition and recommendation systems. [10] For instance, MXFP6 closely matches FP32 for inference tasks after quantization-aware fine-tuning, and MXFP4 can be used for training generative language models with only a minor accuracy penalty.

The MX format has been standardized through the Open Compute Project (OCP) as Microscaling Formats (MX) Specification v1.0. [7] An emulation libraries also has been published to provide details on the data science approach and select results of MX in action. [11]

Hardware support

The following hardware supports BFP operations:

See also

Related Research Articles

<span class="mw-page-title-main">Floating-point arithmetic</span> Computer approximation for real numbers

In computing, floating-point arithmetic (FP) is arithmetic that represents subsets of real numbers using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. Numbers of this form are called floating-point numbers. For example, 12.345 is a floating-point number in base ten with five digits of precision:

In computing, NaN, standing for Not a Number, is a particular value of a numeric data type which is undefined as a number, such as the result of 0/0. Systematic use of NaNs was introduced by the IEEE 754 floating-point standard in 1985, along with the representation of other non-finite quantities such as infinities.

Floating point operations per second is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations.

<span class="mw-page-title-main">Coprocessor</span> Type of computer processor

A coprocessor is a computer processor used to supplement the functions of the primary processor. Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or I/O interfacing with peripheral devices. By offloading processor-intensive tasks from the main processor, coprocessors can accelerate system performance. Coprocessors allow a line of computers to be customized, so that customers who do not need the extra performance do not need to pay for it.

The significand is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its significant digits. Depending on the interpretation of the exponent, the significand may represent an integer or a fractional number.

Hexadecimal floating point is a format for encoding floating-point numbers first introduced on the IBM System/360 computers, and supported on subsequent machines based on that architecture, as well as machines which were intended to be application-compatible with System/360.

In computing, minifloats are floating-point values represented with very few bits. Predictably, they are not well suited for general-purpose numerical calculations. They are used for special purposes such as

Extended precision refers to floating-point number formats that provide greater precision than the basic floating-point formats. Extended precision formats support a basic format by minimizing roundoff and overflow errors in intermediate values of expressions on the base format. In contrast to extended precision, arbitrary-precision arithmetic refers to implementations of much larger numeric types using special software.

Decimal floating-point (DFP) arithmetic refers to both a representation and operations on decimal floating-point numbers. Working directly with decimal (base-10) fractions can avoid the rounding errors that otherwise typically occur when converting between decimal fractions and binary (base-2) fractions.

In computing, half precision is a binary floating-point computer number format that occupies 16 bits in computer memory. It is intended for storage of floating-point values in applications where higher precision is not essential, in particular image processing and neural networks.

In computing, quadruple precision is a binary floating-point–based computer number format that occupies 16 bytes with precision at least twice the 53-bit double precision.

decimal128 is a decimal floating-point computer number format that occupies 128 bits in computer memory. Formally introduced in IEEE 754-2008, it is intended for applications where it is necessary to emulate decimal rounding exactly, such as financial and tax computations.

Zen is the codename for a family of computer processor microarchitectures from AMD, first launched in February 2017 with the first generation of its Ryzen CPUs. It is used in Ryzen, Ryzen Threadripper, and Epyc (server).

<span class="mw-page-title-main">Zen 2</span> 2019 AMD 7-nanometre processor microarchitecture

Zen 2 is a computer processor microarchitecture by AMD. It is the successor of AMD's Zen and Zen+ microarchitectures, and is fabricated on the 7 nm MOSFET node from TSMC. The microarchitecture powers the third generation of Ryzen processors, known as Ryzen 3000 for the mainstream desktop chips, Ryzen 4000U/H and Ryzen 5000U for mobile applications, as Threadripper 3000 for high-end desktop systems, and as Ryzen 4000G for accelerated processing units (APUs). The Ryzen 3000 series CPUs were released on 7 July 2019, while the Zen 2-based Epyc server CPUs were released on 7 August 2019. An additional chip, the Ryzen 9 3950X, was released in November 2019.

<span class="mw-page-title-main">AMD Instinct</span> Brand name by AMD; data center GPUs for high-performance-computing, machine learning

AMD Instinct is AMD's brand of data center GPUs. It replaced AMD's FirePro S brand in 2016. Compared to the Radeon brand of mainstream consumer/gamer products, the Instinct product line is intended to accelerate deep learning, artificial neural network, and high-performance computing/GPGPU applications.

<span class="mw-page-title-main">Ryzen</span> AMD brand for microprocessors

Ryzen is a brand of multi-core x86-64 microprocessors designed and marketed by Advanced Micro Devices (AMD) for desktop, mobile, server, and embedded platforms based on the Zen microarchitecture. It consists of central processing units (CPUs) marketed for mainstream, enthusiast, server, and workstation segments and accelerated processing units (APUs) marketed for mainstream and entry-level segments and embedded systems applications.

The bfloat16 floating-point format is a computer number format occupying 16 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. This format is a shortened (16-bit) version of the 32-bit IEEE 754 single-precision floating-point format (binary32) with the intent of accelerating machine learning and near-sensor computing. It preserves the approximate dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits, but supports only an 8-bit precision rather than the 24-bit significand of the binary32 format. More so than single-precision 32-bit floating-point numbers, bfloat16 numbers are unsuitable for integer calculations, but this is not their intended use. Bfloat16 is used to reduce the storage requirements and increase the calculation speed of machine learning algorithms.

<span class="mw-page-title-main">Zen 4</span> 2022 AMD 5-nanometer processor microarchitecture

Zen 4 is the codename for a CPU microarchitecture designed by AMD, released on September 27, 2022. It is the successor to Zen 3 and uses TSMC's N6 process for I/O dies, N5 process for CCDs, and N4 process for APUs. Zen 4 powers Ryzen 7000 performance desktop processors, Ryzen 8000G series mainstream desktop APUs, and Ryzen Threadripper 7000 series HEDT and workstation processors. It is also used in extreme mobile processors, thin & light mobile processors, as well as EPYC 8004/9004 server processors.

Zen 5 is the codename for an upcoming CPU microarchitecture by AMD, shown on their roadmap in May 2022, destined for a release in July 2024. It is the successor to Zen 4 and is fabricated on TSMC's N4X and N3E processes.

<span class="mw-page-title-main">RDNA 3</span> GPU microarchitecture by AMD

RDNA 3 is a GPU microarchitecture designed by AMD, released with the Radeon RX 7000 series on December 13, 2022. Alongside powering the RX 7000 series, RDNA 3 is also featured in the SoCs designed by AMD for the Asus ROG Ally and Lenovo Legion Go consoles.

References

  1. "Block floating point". BDTI DSP Dictionary. Berkeley Design Technology, Inc. (BDTI). Archived from the original on 2018-07-11. Retrieved 2015-11-01.
  2. Chhabra, Arun; Iyer, Ramesh (December 1999). "TMS320C55x A Block Floating Point Implementation on the TMS320C54x DSP" (PDF) (Application report). Digital Signal Processing Solutions. Texas Instruments. SPRA610. Archived (PDF) from the original on 2018-07-11. Retrieved 2018-07-11.
  3. Elam, David; Iovescu, Cesar (September 2003). "A Block Floating Point Implementation for an N-Point FFT on the TMS320C55x DSP" (PDF) (Application report). TMS320C5000 Software Applications. Texas Instruments. SPRA948. Archived (PDF) from the original on 2018-07-11. Retrieved 2015-11-01.
  4. Wilkinson, James Hardy (1963). Rounding Errors in Algebraic Processes (1 ed.). Englewood Cliffs, NJ, USA: Prentice-Hall, Inc. ISBN   978-0-486-67999-0. MR   0161456.
  5. Muller, Jean-Michel; Brisebarre, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Lefèvre, Vincent; Melquiond, Guillaume; Revol, Nathalie; Stehlé, Damien; Torres, Serge (2010). Handbook of Floating-Point Arithmetic (1 ed.). Birkhäuser. doi:10.1007/978-0-8176-4705-6. ISBN   978-0-8176-4704-9. LCCN   2009939668.
  6. Overton, Michael L. (2001). Numerical Computing with IEEE Floating Point Arithmetic - Including One Theorem, One Rule of Thumb and One Hundred and One Exercises (1 ed.). Society for Industrial and Applied Mathematics (SIAM). ISBN   0-89871-482-6. 9-780898-714821-90000.
  7. 1 2 "Open Compute Project". Open Compute Project. Retrieved 2024-06-03.
  8. Rouhani, Bita Darvish; Zhao, Ritchie; More, Ankit; Hall, Mathew; Khodamoradi, Alireza; Deng, Summer; Choudhary, Dhruv; Cornea, Marius; Dellinger, Eric (2023-10-19). "Microscaling Data Formats for Deep Learning". arXiv: 2310.10537 [cs.LG].
  9. D'Sa, Rani Borkar, Reynold (2023-10-17). "Fostering AI infrastructure advancements through standardization". Microsoft Azure Blog. Retrieved 2024-06-03.{{cite web}}: CS1 maint: multiple names: authors list (link)
  10. Rouhani, Bita; Zhao, Ritchie; Elango, Venmugil; Shafipour, Rasoul; Hall, Mathew; Mesmakhosroshahi, Maral; More, Ankit; Melnick, Levi; Golub, Maximilian (2023-04-12). "With Shared Microexponents, A Little Shifting Goes a Long Way". arXiv: 2302.08007 [cs.LG].
  11. microsoft/microxcaling, Microsoft, 2024-05-29, retrieved 2024-06-03
  12. Clarke, Peter (2023-08-28). "Chiplet-base generative AI platform raises LLM performance". eeNews Europe. Retrieved 2024-04-23.
  13. [SPCL_Bcast] A chiplet based generative inference architecture with block floating point datatypes . Retrieved 2024-04-23 via www.youtube.com.
  14. 1 2 "Tenstorrent AI Accelerators" (PDF).
  15. Bonshor, Gavin. "AMD Announces The Ryzen AI 300 Series For Mobile: Zen 5 With RDNA 3.5, and XDNA2 NPU With 50 TOPS". www.anandtech.com. Retrieved 2024-06-03.
  16. "AMD Extends AI and High-Performance Leadership in Data Center and PCs with New AMD Instinct, Ryzen and EPYC Processors at Computex 2024". Advanced Micro Devices, Inc. 2024-06-02. Retrieved 2024-06-03.

Further reading