Q (number format)

Last updated April 16, 2024

The Q notation is a way to specify the parameters of a binary fixed point number format. For example, in Q notation, the number format denoted by Q8.8 means that the fixed point numbers in this format have 8 bits for the integer part and 8 bits for the fraction part.

Definition

Texas Instruments version

The Q notation, as defined by Texas Instruments,^[1] consists of the letter Q followed by a pair of numbers m.n, where m is the number of bits used for the integer part of the value, and n is the number of fraction bits.

By default, the notation describes signed binary fixed point format, with the unscaled integer being stored in two's complement format, used in most binary processors. The first bit always gives the sign of the value(1 = negative, 0 = non-negative), and it is not counted in the m parameter. Thus, the total number w of bits used is 1 + m + n.

For example, the specification Q3.12 describes a signed binary fixed-point number with a w = 16 bits in total, comprising the sign bit, three bits for the integer part, and 12 bits that are the fraction. That is, a 16-bit signed (two's complement) integer, that is implicitly multiplied by the scaling factor 2⁻¹²

In particular, when n is zero, the numbers are just integers. If m is zero, all bits except the sign bit are fraction bits; then the range of the stored number is from −1.0 (inclusive) to +1.0 (exclusive).

The m and the dot may be omitted, in which case they are inferred from the size of the variable or register where the value is stored. Thus, Q12 means a signed integer with any number of bits, that is implicitly multiplied by 2⁻¹².

The letter U can be prefixed to the Q to denote an unsigned binary fixed-point format. For example, UQ1.15 describes values represented as unsigned 16-bit integers with an implicit scaling factor of 2⁻¹⁵, which range from 0.0 to (2¹⁶−1)/2¹⁵ = +1.999969482421875.

ARM version

A variant of the Q notation has been in use by ARM. In this variant, the m number includes the sign bit. For example, a 16-bit signed integer would be denoted Q15.0 in the TI variant, but Q16.0 in the ARM variant.^[2]^[3]

Characteristics

The resolution (difference between successive values) of a Qm.n or UQm.n format is always 2⁻ⁿ. The range of representable values depends on the notation used:

Range of representable values in Q notation
Notation	Texas Instruments Notation	ARM Notation
Signed Qm.n	−2^m to +2^m − 2⁻ⁿ	−2^m−1 to +2^m−1 − 2⁻ⁿ
Unsigned UQm.n	0 to 2^m − 2⁻ⁿ	0 to 2^m − 2⁻ⁿ

For example, a Q15.1 format number requires 15+1 = 16 bits, has resolution 2⁻¹ = 0.5, and the representable values range from −2¹⁴ = −16384.0 to +2¹⁴− 2⁻¹ = +16383.5. In hexadecimal, the negative values range from 0x8000 to 0xFFFF followed by the non-negative ones from 0x0000 to 0x7FFF.

Math operations

Q numbers are a ratio of two integers: the numerator is kept in storage, the denominator $d$ is equal to 2ⁿ.

Consider the following example:

The Q8 denominator equals 2⁸ = 256
1.5 equals 384/256
384 is stored, 256 is inferred because it is a Q8 number.

If the Q number's base is to be maintained (n remains constant) the Q number math operations must keep the denominator $d$ constant. The following formulas show math operations on the general Q numbers $N_{1}$ and $N_{2}$ . (If we consider the example as mentioned above, $N_{1}$ is 384 and $d$ is 256.)

${\begin{aligned}{\frac {N_{1}}{d}}+{\frac {N_{2}}{d}}&={\frac {N_{1}+N_{2}}{d}}\\{\frac {N_{1}}{d}}-{\frac {N_{2}}{d}}&={\frac {N_{1}-N_{2}}{d}}\\\left({\frac {N_{1}}{d}}\times {\frac {N_{2}}{d}}\right)\times d&={\frac {N_{1}\times N_{2}}{d}}\\\left({\frac {N_{1}}{d}}/{\frac {N_{2}}{d}}\right)/d&={\frac {N_{1}/N_{2}}{d}}\end{aligned}}$

Because the denominator is a power of two, the multiplication can be implemented as an arithmetic shift to the left and the division as an arithmetic shift to the right; on many processors shifts are faster than multiplication and division.

To maintain accuracy, the intermediate multiplication and division results must be double precision and care must be taken in rounding the intermediate result before converting back to the desired Q number.

Using C the operations are (note that here, Q refers to the fractional part's number of bits) :

Addition

int16_tq_add(int16_ta,int16_tb){returna+b;}

With saturation

int16_tq_add_sat(int16_ta,int16_tb){int16_tresult;int32_ttmp;tmp=(int32_t)a+(int32_t)b;if(tmp>0x7FFF)tmp=0x7FFF;if(tmp<-1*0x8000)tmp=-1*0x8000;result=(int16_t)tmp;returnresult;}

Unlike floating point ±Inf, saturated results are not sticky and will unsaturate on adding a negative value to a positive saturated value (0x7FFF) and vice versa in that implementation shown. In assembly language, the Signed Overflow flag can be used to avoid the typecasts needed for that C implementation.

Subtraction

int16_tq_sub(int16_ta,int16_tb){returna-b;}

Multiplication

// precomputed value:#define K   (1 << (Q - 1))// saturate to range of int16_tint16_tsat16(int32_tx){if(x>0x7FFF)return0x7FFF;elseif(x<-0x8000)return-0x8000;elsereturn(int16_t)x;}int16_tq_mul(int16_ta,int16_tb){int16_tresult;int32_ttemp;temp=(int32_t)a*(int32_t)b;// result type is operand's type// Rounding; mid values are rounded uptemp+=K;// Correct by dividing by base and saturate resultresult=sat16(temp>>Q);returnresult;}

Division

int16_tq_div(int16_ta,int16_tb){/* pre-multiply by the base (Upscale to Q16 so that the result will be in Q8 format) */int32_ttemp=(int32_t)a<<Q;/* Rounding: mid values are rounded up (down for negative values). *//* OR compare most significant bits i.e. if (((temp >> 31) & 1) == ((b >> 15) & 1)) */if((temp>=0&&b>=0)||(temp<0&&b<0)){temp+=b/2;/* OR shift 1 bit i.e. temp += (b >> 1); */}else{temp-=b/2;/* OR shift 1 bit i.e. temp -= (b >> 1); */}return(int16_t)(temp/b);}

Related Research Articles

In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers $n \geq k \geq 0$ and is written $It is the coefficient of the x k term in the polynomial expansion of the binomial power (1 + x) n; this coefficient can be computed by the multiplicative formula$

In mathematics and computer programming, exponentiating by squaring is a general method for fast computation of large positive integer powers of a number, or more generally of an element of a semigroup, like a polynomial or a square matrix. Some variants are commonly referred to as square-and-multiply algorithms or binary exponentiation. These can be of quite general use, for example in modular arithmetic or powering of matrices. For semigroups for which additive notation is commonly used, like elliptic curves used in cryptography, this method is also referred to as double-and-add.

In mathematics, the gamma function is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers except the non-positive integers. For every positive integer $n$ ,

In mathematics, a square-free integer (or squarefree integer) is an integer which is divisible by no square number other than 1. That is, its prime factorization has exactly one factor for each prime that appears in it. For example, 10 = 2 ⋅ 5 is square-free, but 18 = 2 ⋅ 3 ⋅ 3 is not, because 18 is divisible by 9 = 3². The smallest positive square-free numbers are

In mathematics, a dyadic rational or binary rational is a number that can be expressed as a fraction whose denominator is a power of two. For example, 1/2, 3/2, and 3/8 are dyadic rationals, but 1/3 is not. These numbers are important in computer science because they are the only ones with finite binary representations. Dyadic rationals also have applications in weights and measures, musical time signatures, and early mathematics education. They can accurately approximate any real number.

A mathematical symbol is a figure or a combination of figures that is used to represent a mathematical object, an action on mathematical objects, a relation between mathematical objects, or for structuring the other symbols that occur in a formula. As formulas are entirely constituted with symbols of various types, many symbols are needed for expressing all mathematics.

<span class="mw-page-title-main">Exponentiation</span> Arithmetic operation

In mathematics, exponentiation is an operation involving two numbers: the base and the exponent or power. Exponentiation is written as $b n$ , where $b$ is the base and $n$ is the power; this is pronounced as " $b$ (raised) to the $n$ ". When $n$ is a positive integer, exponentiation corresponds to repeated multiplication of the base: that is, $b n$ is the product of multiplying $n$ bases:

A one-instruction set computer (OISC), sometimes referred to as an ultimate reduced instruction set computer (URISC), is an abstract machine that uses only one instruction – obviating the need for a machine language opcode. With a judicious choice for the single instruction and given arbitrarily many resources, an OISC is capable of being a universal computer in the same manner as traditional computers that have multiple instructions. OISCs have been recommended as aids in teaching computer architecture and have been used as computational models in structural computing research. The first carbon nanotube computer is a 1-bit one-instruction set computer.

A triangular number or triangle number counts objects arranged in an equilateral triangle. Triangular numbers are a type of figurate number, other examples being square numbers and cube numbers. The $n$ th triangular number is the number of dots in the triangular arrangement with $n$ dots on each side, and is equal to the sum of the $n$ natural numbers from 1 to $n$ . The sequence of triangular numbers, starting with the 0th triangular number, is

Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all lossy compression algorithms.

A power of two is a number of the form $2 n$ where $n$ is an integer, that is, the result of exponentiation with number two as the base and integer $n$ as the exponent. If $n$ is negative, $2 n$ is called an negative power of two or inverse power of two.

In computing, fixed-point is a method of representing fractional (non-integer) numbers by storing a fixed number of digits of their fractional part. Dollar amounts, for example, are often stored with exactly two fractional digits, representing the cents. More generally, the term may refer to representing fractional values as integer multiples of some fixed small unit, e.g. a fractional amount of hours as an integer multiple of ten-minute intervals. Fixed-point number representation is often contrasted to the more complicated and computationally demanding floating-point representation.

In mathematics, the Gaussian binomial coefficients are q-analogs of the binomial coefficients. The Gaussian binomial coefficient, written as $or, is a polynomial in q with integer coefficients, whose value when q is set to a prime power counts the number of subspaces of dimension k in a vector space of dimension n over, a finite field with q elements; i.e. it is the number of points in the finite Grassmannian .$

In mathematics, particularly in combinatorics, a Stirling number of the second kind is the number of ways to partition a set of n objects into k non-empty subsets and is denoted by $or . Stirling numbers of the second kind occur in the field of mathematics called combinatorics and the study of partitions. They are named after James Stirling.$

Methods of computing square roots are algorithms for approximating the non-negative square root $of a positive real number . Since all square roots of natural numbers, other than of perfect squares, are irrational, square roots can usually only be computed to some finite precision: these methods typically construct a series of increasingly accurate approximations.$

A division algorithm is an algorithm which, given two integers N and D, computes their quotient and/or remainder, the result of Euclidean division. Some are applied by hand, while others are employed by digital circuit designs and software.

A negative base may be used to construct a non-standard positional numeral system. Like other place-value systems, each position holds multiples of the appropriate power of the system's base; but that base is negative—that is to say, the base $b$ is equal to $-r$ for some natural number $r$ .

In mathematics, a rational number is a number that can be expressed as the quotient or fraction of two integers, a numerator $p$ and a non-zero denominator $q$ . For example, is a rational number, as is every integer. The set of all rational numbers, also referred to as "the rationals", the field of rationals or the field of rational numbers is usually denoted by boldface $Q$ , or blackboard bold

<span class="mw-page-title-main">Fast inverse square root</span> Root-finding algorithm

Fast inverse square root, sometimes referred to as Fast InvSqrt or by the hexadecimal constant 0x5F3759DF, is an algorithm that estimates $, the reciprocal of the square root of a 32-bit floating-point number in IEEE 754 floating-point format. The algorithm is best known for its implementation in 1999 in Quake III Arena, a first-person shooter video game heavily based on 3D graphics. With subsequent hardware advancements, especially the x86 SSE instruction rsqrtss, this algorithm is not generally the best choice for modern computers, though it remains an interesting historical example.$

Single-precision floating-point format is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

References

↑ "Appendix A.2". TMS320C64x DSP Library Programmer's Reference (PDF). Dallas, Texas, USA: Texas Instruments Incorporated. October 2003. SPRU565. Archived (PDF) from the original on 2022-12-22. Retrieved 2022-12-22. (150 pages)
↑ "ARM Developer Suite AXD and armsd Debuggers Guide". 1.2. ARM Limited. 2001 [1999]. Chapter 4.7.9. AXD > AXD Facilities > Data formatting > Q-format. ARM DUI 0066D. Archived from the original on 2017-11-04.
↑ "Chapter 4.7.9. AXD > AXD Facilities > Data formatting > Q-format". RealView Development Suite AXD and armsd Debuggers Guide (PDF). 3.0. ARM Limited. 2006 [1999]. pp. 4–24. ARM DUI 0066G. Archived (PDF) from the original on 2017-11-04.

External links

"Q-Number-Format Java Implementation". GitHub . Archived from the original on 2017-11-04. Retrieved 2017-11-04.
"Q-format Converter". Archived from the original on 2021-06-25. Retrieved 2021-06-25.
"Q Library (C implementation)". GitHub . Retrieved 2024-03-05.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[TI_2003-1] "Appendix A.2". TMS320C64x DSP Library Programmer's Reference (PDF). Dallas, Texas, USA: Texas Instruments Incorporated. October 2003. SPRU565. Archived (PDF) from the original on 2022-12-22. Retrieved 2022-12-22. (150 pages)

[ARM_2001-2] "ARM Developer Suite AXD and armsd Debuggers Guide". 1.2. ARM Limited. 2001 [1999]. Chapter 4.7.9. AXD > AXD Facilities > Data formatting > Q-format. ARM DUI 0066D. Archived from the original on 2017-11-04.

[ARM_2006-3] "Chapter 4.7.9. AXD > AXD Facilities > Data formatting > Q-format". RealView Development Suite AXD and armsd Debuggers Guide (PDF). 3.0. ARM Limited. 2006 [1999]. pp. 4–24. ARM DUI 0066G. Archived (PDF) from the original on 2017-11-04.

[1]

[2]

[3]