TensorFloat-32

Last updated April 15, 2025

TensorFloat-32 (TF32) is a numeric floating point format designed for Tensor Core running on certain Nvidia GPUs.

Format

The binary format is:

1 sign bit
8 exponent bits
10 significand bits (also called mantissa, or precision bits)

The total 19-bit format fits within a double word (32 bits), and while it lacks precision compared with a normal 32-bit IEEE 754 floating-point number, provides much faster computation, up to 8 times on a A100 (compared to a V100 using FP32).^[1]

References

↑ https://deeprec.readthedocs.io/en/latest/NVIDIA-TF32.html accessed 23 May 2024

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] ttps://deeprec.readthedocs.io/en/latest/NVIDIA-TF32.html accessed 23 May 2024

[1]

TensorFloat-32

Contents

Format

See also

References