Unums (universal numbers [1] ) are a family of number formats and arithmetic for implementing real numbers on a computer, proposed by John L. Gustafson in 2015. [2] They are designed as an alternative to the ubiquitous IEEE 754 floating-point standard. The latest version is known as posits. [3]


Type I Unum

The first version of unums, formally known as Type I unum, was introduced in Gustafson's book The End of Error as a superset of the IEEE-754 floating-point format. [2] The defining features of the Type I unum format are:

For computation with the format, Gustafson proposed using interval arithmetic with a pair of unums, what he called a ubound, providing the guarantee that the resulting interval contains the exact solution.

William M. Kahan and Gustafson debated unums at the Arith23 conference. [4] [5] [6] [7]

Type II Unum

Type II Unums were introduced in 2016 [8] as a redesign of Unums that broke IEEE-754 compatibility : in addition to the sign bit and the interval bit mentioned earlier, the Type II unum uses a bit to indicate inversion. These three operations make it possible, starting from a finite set of points between one and infinity, to quantify the entire projective line except for four points: the two exceptions, 0 and ∞, and then 1 and -1. This set of points is chosen arbitrarily, and arithmetic operations involving them are not performed logically but rather by using a lookup table. The size of such a table becomes prohibitive for an encoding format spanning multiple bytes. This challenge necessitated the development of the Type III unum, known as the posit, discussed below.

Posit (Type III Unum)

In February 2017, Gustafson officially introduced Type III unums (posits), for fixed floating-point-like values and valids for interval arithmetic. [3] In March 2022, a standard was ratified and published by the Posit Working Group. [9]

Posits [3] [10] [11] are a hardware-friendly version of unum where difficulties faced in the original type I unum due to its variable size are resolved. Compared to IEEE 754 floats of similar size, posits offer a bigger dynamic range and more fraction bits for values with magnitude near 1 (but fewer fraction bits for very large or very small values), and Gustafson claims that they offer better accuracy. [12] [13] Studies [14] [15] confirm that for some applications, posits with quire out-perform floats in accuracy. Posits have superior accuracy in the range near one, where most computations occur. This makes it very attractive to the current trend in deep learning to minimize the number of bits used. It potentially helps any application to accelerate by enabling the use of fewer bits (since it has more fraction bits for accuracy) reducing network and memory bandwidth and power requirements.

The format of an n-bit posit is given a label of "posit" followed by the decimal digits of n (e.g., the 16-bit posit format is "posit16") and consists of four sequential fields:

  1. sign: 1 bit, representing an unsigned integer s
  2. regime: at least 2 bits and up to (n  1), representing an unsigned integer r as described below
  3. exponent: generally 2 bits as available after regime, representing an unsigned integer e
  4. fraction: all remaining bits available after exponent, representing a non-negative real dyadic rational f less than 1

The regime field uses unary coding of k identical bits, followed by a bit of opposite value if any remaining bits are available, to represent an unsigned integer r that is −k if the first bit is 0 or k  1 if the first bit is 1. The sign, exponent, and fraction fields are analogous to IEEE 754 sign, exponent, and significand fields (respectively), except that the posit exponent and fraction fields may be absent or truncated and implicitly extended with zeroes—an absent exponent is treated as 002 (representing 0), a one-bit exponent E1 is treated as E102 (representing the integer 0 if E1 is 0 or 2 if E1 is 1), and an absent fraction is treated as 0. Negative numbers (s is 1) are encoded as 2's complements.

The two encodings in which all non-sign bits are 0 have special interpretations:

Otherwise, the posit value is equal to , in which r scales by powers of 16, e scales by powers of 2, f distributes values uniformly between adjacent combinations of (r, e), and s adjusts the sign symmetrically about 0.


Any10…NaRanything not mathematically definable as a unique real number [9]
Any00111 0…0.5
Any00…1smallest positive value
Any01…largest positive value
posit800000001smallest positive value
posit801111111largest positive value
posit160000000000000001smallest positive value
posit160111111111111111largest positive value
posit3200000000000000000000000000000001smallest positive value
posit3201111111111111111111111111111111largest positive value

Note: 32-bit posit is expected to be sufficient to solve almost all classes of applications[ citation needed ].


For each positn type of precision , the standard defines a corresponding "quire" type quiren of precision , used to accumulate exact sums of products of those posits without rounding or overflow in dot products for vectors of up to 231 or more elements (the exact limit is ). The quire format is a two's complement signed integer, interpreted as a multiple of units of magnitude except for the special value with a leading sign bit of 1 and all other bits equal to 0 (which represents NaR). Quires are based on the work of Ulrich W. Kulisch and Willard L. Miranker. [16]


Valids are described as a Type III Unum mode that bounds results in a given range. [3]


Several software and hardware solutions implement posits. [14] [17] [18] [19] [20] The first complete parameterized posit arithmetic hardware generator was proposed in 2018. [21]

Unum implementations have been explored in Julia [22] [23] [24] [25] [26] [27] and MATLAB. [28] [29] A C++ version [30] with support for any posit sizes combined with any number of exponent bits is available. A fast implementation in C, SoftPosit, [31] provided by the NGA research team based on Berkeley SoftFloat adds to the available software implementations.







World's first FPGA GPGPU32Yes~3.2 TPOPSExhaustive. No known bugs.RacEr GP-GPU has 512 cores


C library based on Berkeley SoftFloat

C++ wrapper to override operators Python wrapper using SWIG of SoftPosit

8, 16, 32 published and complete;Yes~60 to 110 MPOPS on x86 core (Broadwell)8: Exhaustive;

16: Exhaustive except FMA, quire 32: Exhaustive test is still in progress. No known bugs.

Open source license. Fastest and most comprehensive C library for posits presently. Designed for plug-in comparison of IEEE floats and posits.


Mathematica notebookAllYes< 80 KPOPSExhaustive for low precisions. No known bugs.Open source (MIT license). Original definition and prototype. Most complete environment for comparing IEEE floats and posits. Many examples of use, including linear solvers


JavaScript widgetConvert decimal to posit 6, 8, 16, 32; generate tables 2–17 with es 1–4.N/AN/A;
interactive widget
Fully testedTable generator and conversion

Stillwater Supercomputing, Inc

C++ template library

C library Python wrapper Golang library

Arbitrary precision posit float valid (p)

Unum type 1 (p) Unum type 2 (p)

Arbitrary quire configurations with programmable capacityposit<4,0> 1 GPOPS

posit<8,0> 130 MPOPS posit<16,1> 115 MPOPS posit<32,2> 105 MPOPS posit<64,3> 50 MPOPS posit<128,4> 1 MPOPS posit<256,5> 800 KPOPS

Complete validation suite for arbitrary posits

Randoms for large posit configs. Uses induction to prove nbits+1 is correct no known bugs

Open source. MIT license.

Fully integrated with C/C++ types and automatic conversions. Supports full C++ math library (native and conversion to/from IEEE). Runtime integrations: MTL4/MTL5, Eigen, Trilinos, HPR-BLAS. Application integrations: G+SMO, FDBB, FEniCS, ODEintV2, TVM.ai. Hardware accelerator integration (Xilinx, Intel, Achronix).


Chung Shin Yee

Python libraryAllNo~20 MPOPSExtensive; no known bugsOpen source (MIT license)

David Thien

SoftPosit bindings for RacketAllYesUn­knownUn­known

Bill Zorn

SoftPosit bindings for PythonAllYes~20–45 MPOPS on 4.9 GHz Skylake coreUn­known

Diego Coelho

Octave implementationAllNoUn­knownLimited Testing; no known bugs GNU GPL
Sigmoid Numbers

Isaac Yonemoto

Julia libraryAll <32, all ESYesUn­knownNo known bugs (posits).

Division bugs (valids)

Leverages Julia's templated mathematics standard library, can natively do matrix and tensor operations, complex numbers, FFT, DiffEQ. Support for valids

Isaac Yonemoto

Julia and C/C++ library8, 16, 32, all ESNoUn­knownKnown bug in 32-bit multiplicationUsed by LLNL in shock studies

Milan Klöwer

Julia libraryBased on softposit;

8-bit (es=0..2) 16-bit (es=0..2) 24-bit (es=1..2) 32-bit (es=2)

YesSimilar to

A*STAR "SoftPosit" (Cerlane Leong)


Posit (8,0), Posit (16,1), Posit (32,2) Other formats lack full functionality

Open source. Issues and suggestions on GitHub.

This project was developed due to the fact that SigmoidNumbers and FastSigmoid by Isaac Yonemoto is not maintained currently.

Supports basic linear algebra functions in Julia (Matrix multiplication, Matrix solve, Elgen decomposition, etc.)


Ken Mercado

Python libraryAllYes< 20 MPOPSUn­knownOpen source (MIT license). Easy-to-use interface. Neural net example. Comprehensive functions support.

Federico Rossi, Emanuele Ruffaldi

C++ library4 to 64 (any es value); "Template version is 2 to 63 bits"NoUn­knownA few basic tests4 levels of operations working with posits. Special support for NaN types (non-standard)
bfp:Beyond Floating Point

Clément Guérin

C++ libraryAnyNoUn­knownBugs found; status of fixes unknownSupports + – × ÷ √ reciprocal, negate, compare

Isaac Yonemoto

Julia and Verilog8, 16, 32, ES=0NoUn­knownComprehensively tested for 8-bit, no known bugsIntended for Deep Learning applications Addition, Subtraction and Multiplication only. A proof of concept matrix multiplier has been built, but is off-spec in its precision
Lombiq Arithmetics

Lombiq Technologies

C# with Hastlayer for hardware generation8, 16, 32.

(64bits in progress)


Click here for more

PartialRequires Microsoft .Net APIs
Deepfloat Jeff Johnson, FacebookSystemVerilogAny (parameterized SystemVerilog)YesN/A

(RTL for FPGA/ASIC designs)

LimitedDoes not strictly conform to posit spec.

Supports +,-,/,*. Implements both logarithmic posit and normal, "linear" posits License: CC-BY-NC 4.0 at present

Tokyo TechFPGA16, 32, extendableNo"2 GHz", not translated to MPOPSPartial; known rounding bugsYet to be open-source
PACoGen: Posit Arthmetic Core Generator Manish Kumar JaiswalVerilog HDL for Posit ArithmeticAny precision.

Able to generate any combination of word-size (N) and exponent-size (ES)

NoSpeed of design is based on the underlying hardware platform (ASIC/FPGA)Exhaustive tests for 8-bit posit.

Multi-million random tests are performed for up to 32-bit posit with various ES combinations

It supports rounding-to-nearest rounding method.
Vinay Saxena, Research and Technology Centre, Robert Bosch, India (RTC-IN) and Farhad Merchant, RWTH Aachen UniversityVerilog generator for VLSI, FPGAAllNoSimilar to floats of same bit sizeN=8

- ES=2 | N=7,8,9,10,11,12 Selective (20000*65536) combinations for - ES=1 | N=16

To be used in commercial products. To the best of our knowledge.

***First ever integration of posits in RISC-V***

Posit-enabled RISC-V core

(Sugandha Tiwari, Neel Gala, Chester Rebeiro, V.Kamakoti, IIT MADRAS)

BSV (Bluespec System Verilog) Implementation32-bit posit with (es=2) and (es=3)NoVerified against SoftPosit for (es=2) and tested with several applications for (es=2) and (es=3). No known bugs.First complete posit-capable RISC-V core. Supports dynamic switching between (es=2) and (es=3).

More info here.


David Mallasén

Open-Source Posit RISC-V Core with Quire CapabilityPosit<32,2> with 512-bit quireYesSpeed of design is based on the underlying hardware platform (ASIC/FPGA)Functionality testing of each posit instruction.Application-level posit-capable RISC-V core based on CVA6 that can execute all posit instructions, including the quire fused operations. PERCIVAL is the first work that integrates the complete posit ISA and quire in hardware. It allows the native execution of posit instructions as well as the standard floating-point ones simultaneously.

Chris Lomont

Single file C# MIT LicensedAny sizeNoExtensive; no known bugsOps: arithmetic, comparisons, sqrt, sin, cos, tan, acos, asin, atan, pow, exp, log

REX Computing

FPGA version of the "Neo" VLIW processor with posit numeric unit32No~1.2 GPOPSExtensive; no known bugsNo divide or square root. First full processor design to replace floats with posits.
PNU: Posit Numeric Unit

Calligo Tech

  • World's first posit-enabled ASIC with octa-core RISC-V processor and Quire implemented.
  • PCIe accelerator card with this silicon will be ready June 2024
  • Fully software stack with compilers, debugger, IDE environment and math libraries for applications. C, C++, Python languages supported
  • Applications tested successfully - image and video compression, more to come
  • <32, 2> with Quire 512 bits support.
  • <64, 3>
Yes - Fully supported.500 MHz * 8 CoresExhaustive tests completed for 32 bits and 64 bits with Quire support completed.

Applications tested and being made available for seamless adoption www.calligotech.com

Fully integrated with C/C++ types and automatic conversions. Supports full C++ math library (native and conversion to/from IEEE). Runtime integrations: GNU Utils, OpenBLAS, CBLAS. Application integrations: in progress. Compiler support extended: C/C++, G++, GFortran & LLVM (in progress).

Jianyu Chen

Specific-purpose FPGA32Yes16–64 GPOPSOnly one known case testedDoes 128-by-128 matrix-matrix multiplication (SGEMM) using quire.
Deep PeNSieve

Raul Murillo

Python library (software)8, 16, 32YesUn­knownUn­knownA DNN framework using posits

Jaap Aarts

Pure Go library16/1 32/2 (included is a generic 32/ES for ES<32)[ clarification needed ]No80 MPOPS for div32/2 and similar linear functions. Much higher for truncate and much lower for exp.Fuzzing against C softposit with a lot of iterations for 16/1 and 32/2. Explicitly testing edge cases found.(MIT license) The implementations where ES is constant the code is generated. The generator should be able to generate for all sizes {8,16,32} and ES below the size. However, the ones not included into the library by default are not tested, fuzzed, or supported. For some operations on 32/ES, mixing and matching ES is possible. However, this is not tested.


SoftPosit [31] is a software implementation of posits based on Berkeley SoftFloat. [32] It allows software comparison between posits and floats. It currently supports

Helper functions

  • convert double to posit
  • convert posit to double
  • cast unsigned integer to posit

It works for 16-bit posits with one exponent bit and 8-bit posit with zero exponent bit. Support for 32-bit posits and flexible type (2-32 bits with two exponent bits) is pending validation. It supports x86_64 systems. It has been tested on GNU gcc (SUSE Linux) 4.8.5 Apple LLVM version 9.1.0 (clang-902.0.39.2).


Add with posit8_t

#include"softposit.h"intmain(intargc,char*argv[]){posit8_tpA,pB,pZ;pA=castP8(0xF2);pB=castP8(0x23);pZ=p8_add(pA,pB);// To check answer by converting it to doubledoubledZ=convertP8ToDouble(pZ);printf("dZ: %.15f\n",dZ);// To print result in binary (warning: non-portable code)uint8_tuiZ=castUI8(pZ);printBinary((uint64_t*)&uiZ,8);return0;}

Fused dot product with quire16_t

// Convert double to positposit16_tpA=convertDoubleToP16(1.02783203125);posit16_tpB=convertDoubleToP16(0.987060546875);posit16_tpC=convertDoubleToP16(0.4998779296875);posit16_tpD=convertDoubleToP16(0.8797607421875);quire16_tqZ;// Set quire to 0qZ=q16_clr(qZ);// Accumulate products without roundingsqZ=q16_fdp_add(qZ,pA,pB);qZ=q16_fdp_add(qZ,pC,pD);// Convert back to positposit16_tpZ=q16_to_p16(qZ);// To check answerdoubledZ=convertP16ToDouble(pZ);


William M. Kahan, the principal architect of IEEE 754-1985 criticizes type I unums on the following grounds (some are addressed in type II and type III standards): [6] [33]

See also

