Serial number arithmetic

Last updated

Many protocols and algorithms require the serialization or enumeration of related entities. For example, a communication protocol must know whether some packet comes "before" or "after" some other packet. The IETF (Internet Engineering Task Force) RFC   1982 attempts to define "serial number arithmetic" for the purposes of manipulating and comparing these sequence numbers. In short, when the absolute serial number value decreases by more than half of the maximum value (e.g. 128 in an 8-bit value), it is considered to be "after" the former, whereas other decreases are considered to be "before".

Contents

This task is rather more complex than it might first appear, because most algorithms use fixed-size (binary) representations for sequence numbers. It is often important for the algorithm not to "break down" when the numbers become so large that they are incremented one last time and "wrap" around their maximum numeric ranges (go instantly from a large positive number to 0 or a large negative number). Some protocols choose to ignore these issues and simply use very large integers for their counters, in the hope that the program will be replaced (or they will retire) before the problem occurs (see Y2K).

Many communication protocols apply serial number arithmetic to packet sequence numbers in their implementation of a sliding window protocol. Some versions of TCP use protection against wrapped sequence numbers (PAWS). PAWS applies the same serial number arithmetic to packet timestamps, using the timestamp as an extension of the high-order bits of the sequence number. [1]

Operations on sequence numbers

Only addition of a small positive integer to a sequence number and comparison of two sequence numbers are discussed. Only unsigned binary implementations are discussed, with an arbitrary size in bits noted throughout the RFC (and below) as "SERIAL_BITS".

Addition

Adding an integer to a sequence number is simple unsigned integer addition, followed by unsigned modulo operation to bring the result back into range (usually implicit in the unsigned addition, on most architectures):

s' = (s + n) modulo 2SERIAL_BITS

Addition of a value below 0 or above 2SERIAL_BITS−1 − 1 is undefined. Basically, adding values beyond this range will cause the resultant sequence number to "wrap", and (often) result in a number that is considered "less than" the original sequence number.

Comparison

A means of comparing two sequence numbers i1 and i2 (the unsigned integer representations of sequence numbers s1 and s2) is presented.

Equality is defined as simple numeric equality.

The algorithm presented for comparison is complex, having to take into account whether the first sequence number is close to the "end" of its range of values, and thus a smaller "wrapped" number may actually be considered "greater" than the first sequence number. Thus i1 is considered less than i2 only if

(i1 < i2 and i2i1 < 2SERIAL_BITS−1) or
(i1 > i2 and i1i2 > 2SERIAL_BITS−1)

Shortfalls

The algorithms presented by the RFC have at least one significant shortcoming: there are sequence numbers for which comparison is undefined. Since many algorithms are implemented independently by multiple independent cooperating parties, it is often impossible to prevent all such situations from occurring.

The authors of RFC   1982 acknowledge this without offering a general solution:

While it would be possible to define the test in such a way that the inequality would not have this surprising property, while being defined for all pairs of values, such a definition would be unnecessarily burdensome to implement, and difficult to understand, and would still allow cases where

s1 < s2 and (s1 + 1) > (s2 + 1) 

which is just as non-intuitive.

Thus the problem case is left undefined, implementations are free to return either result, or to flag an error, and users must take care not to depend on any particular outcome. Usually this will mean avoiding allowing those particular pairs of numbers to co-exist.

Thus, it is often difficult or impossible to avoid all "undefined" comparisons of sequence numbers. However, a relatively simple solution is available. By mapping the unsigned sequence numbers onto signed two's complement arithmetic operations, every comparison of any sequence number is defined, and the comparison operation itself is dramatically simplified. All comparisons specified by the RFC retain their original truth values; only the formerly "undefined" comparisons are affected.

General solution

The RFC   1982 algorithm specifies that, for N-bit sequence numbers, there are 2N−1  1 values considered "greater than" and 2N−1  1 considered "less than". Comparison against the remaining value (exactly 2N−1-distant) is deemed to be "undefined".

Most modern hardware implements signed two's complement binary arithmetic operations. These operations are fully defined for the entire range of values for any operands they are given, since any N-bit binary number can contain 2N distinct values, and since one of them is taken up by the value 0, there are an odd number of spots left for all the non-zero positive and negative numbers. There is simply one more negative number representable than there are positive. For example, a 16-bit 2's complement value may contain numbers ranging from −32768 to +32767.

So, if we simply re-cast sequence numbers as 2's complement integers and allow there to be one more sequence number considered "less than" than there are sequence numbers considered "greater than", we should be able to use simple signed arithmetic comparisons instead of the logically incomplete formula proposed by the RFC.

Here are some examples (in 16 bits, again), comparing some random sequence numbers, against the sequence number with the value 0:

unsigned    binary    signed sequence    value     distance --------    ------    --------    32767 == 0x7FFF ==  32767        1 == 0x0001 ==      1        0 == 0x0000 ==      0    65535 == 0xFFFF ==     −1     65534 == 0xFFFE ==     −2    32768 == 0x8000 == −32768

It is easy to see that the signed interpretation of the sequence numbers are in the correct order, so long as we "rotate" the sequence number in question so that its 0 matches up with the sequence number we are comparing it against. It turns out that this is simply done using an unsigned subtraction and simply interpreting the result as a signed two's complement number. The result is the signed "distance" between the two sequence numbers. Once again, if i1 and i2 are the unsigned binary representations of the sequence numbers s1 and s2, the distance from s1 to s2 is

distance=(signed)(i1-i2)

If distance is 0, the numbers are equal. If it is < 0, then s1 is "less than" or "before" s2. Simple, clean and efficient, and fully defined. However, not without surprises.

All sequence number arithmetic must deal with "wrapping" of sequence numbers; the number 2N−1 is equidistant in both directions, in RFC   1982 sequence number terms. In our math, they are both considered to be "less than" each other:

distance1=(signed)(0x8000-0x0)==(signed)0x8000==-32768<0distance2=(signed)(0x0-0x8000)==(signed)0x8000==-32768<0

This is obviously true for any two sequence numbers with distance of 0x8000 between them.

Furthermore, implementing serial number arithmetic using two's complement arithmetic implies serial numbers of a bit-length matching the machine's integer sizes; usually 16-bit, 32-bit and 64-bit. Implementing 20-bit serial numbers needs shifts (assuming 32-bit ints):

distance=(signed)((i1<<12)-(i2<<12))

See also

Related Research Articles

In computer science, an integer is a datum of integral data type, a data type that represents some range of mathematical integers. Integral data types may be of different sizes and may or may not be allowed to contain negative values. Integers are commonly represented in a computer as a group of binary digits (bits). The size of the grouping varies so the set of integer sizes available varies between different types of computers. Computer hardware nearly always provides a way to represent a processor register or memory address as an integer.

<span class="mw-page-title-main">Arithmetic shift</span> Shift operator in computer programming

In computer programming, an arithmetic shift is a shift operator, sometimes termed a signed shift. The two basic types are the arithmetic left shift and the arithmetic right shift. For binary numbers it is a bitwise operation that shifts all of the bits of its operand; every bit in the operand is simply moved a given number of bit positions, and the vacant bit-positions are filled in. Instead of being filled with all 0s, as in logical shift, when shifting to the right, the leftmost bit is replicated to fill in all the vacant positions.

Double-precision floating-point format is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operations and directly supported by the processor. Most bitwise operations are presented as two-operand instructions where the result replaces one of the input operands.

Two's complement is the most common method of representing signed integers on computers, and more generally, fixed point binary values. Two's complement uses the binary digit with the greatest place value as the sign to indicate whether the binary number is positive or negative. When the most significant bit is 1, the number is signed as negative; and when the most significant bit is 0 the number is signed as positive.

In computing, fixed-point is a method of representing fractional (non-integer) numbers by storing a fixed number of digits of their fractional part. Dollar amounts, for example, are often stored with exactly two fractional digits, representing the cents. More generally, the term may refer to representing fractional values as integer multiples of some fixed small unit, e.g. a fractional amount of hours as an integer multiple of ten-minute intervals. Fixed-point number representation is often contrasted to the more complicated and computationally demanding floating-point representation.

In computing, signed number representations are required to encode negative numbers in binary number systems.

In IEEE 754 floating-point numbers, the exponent is biased in the engineering sense of the word – the value stored is offset from the actual value by the exponent bias, also called a biased exponent. Biasing is done because exponents have to be signed values in order to be able to represent both tiny and huge values, but two's complement, the usual representation for signed values, would make comparison harder.

In computing, signedness is a property of data types representing numbers in computer programs. A numeric variable is signed if it can represent both positive and negative numbers, and unsigned if it can only represent non-negative numbers.

In computer science, a logical shift is a bitwise operation that shifts all the bits of its operand. The two base variants are the logical left shift and the logical right shift. This is further modulated by the number of bit positions a given value shall be shifted, such as shift left by 1 or shift right by n. Unlike an arithmetic shift, a logical shift does not preserve a number's sign bit or distinguish a number's exponent from its significand (mantissa); every bit in the operand is simply moved a given number of bit positions, and the vacant bit-positions are filled, usually with zeros, and possibly ones.

<span class="mw-page-title-main">Integer overflow</span> Computer arithmetic error

In computer programming, an integer overflow occurs when an arithmetic operation attempts to create a numeric value that is outside of the range that can be represented with a given number of digits – either higher than the maximum or lower than the minimum representable value.

<span class="mw-page-title-main">C data types</span> Data types supported by the C programming language

In the C programming language, data types constitute the semantics and characteristics of storage of data elements. They are expressed in the language syntax in form of declarations for memory locations or variables. Data types also determine the types of operations or methods of processing of data elements.

Signed zero is zero with an associated sign. In ordinary arithmetic, the number 0 does not have a sign, so that −0, +0 and 0 are equivalent. However, in computing, some number representations allow for the existence of two zeros, often denoted by −0 and +0, regarded as equal by the numerical comparison operations but with possible different behaviors in particular operations. This occurs in the sign-magnitude and ones' complement signed number representations for integers, and in most floating-point number representations. The number 0 is usually encoded as +0, but can still be represented by +0, −0, or 0.

In computer processors, the overflow flag is usually a single bit in a system status register used to indicate when an arithmetic overflow has occurred in an operation, indicating that the signed two's-complement result would not fit in the number of bits used for the result. Some architectures may be configured to automatically generate an exception on an operation resulting in overflow.

The Q notation is a way to specify the parameters of a binary fixed point number format. For example, in Q notation, the number format denoted by Q8.8 means that the fixed point numbers in this format have 8 bits for the integer part and 8 bits for the fraction part.

A binary multiplier is an electronic circuit used in digital electronics, such as a computer, to multiply two binary numbers.

In computing, bit numbering is the convention used to identify the bit positions in a binary number.

Offset binary, also referred to as excess-K, excess-N, excess-e, excess code or biased representation, is a method for signed number representation where a signed number n is represented by the bit pattern corresponding to the unsigned number n+K, K being the biasing value or offset. There is no standard for offset binary, but most often the K for an n-bit binary word is K = 2n−1 (for example, the offset for a four-digit binary number would be 23=8). This has the consequence that the minimal negative value is represented by all-zeros, the "zero" value is represented by a 1 in the most significant bit and zero in all other bits, and the maximal positive value is represented by all-ones (conveniently, this is the same as using two's complement but with the most significant bit inverted). It also has the consequence that in a logical comparison operation, one gets the same result as with a true form numerical comparison operation, whereas, in two's complement notation a logical comparison will agree with true form numerical comparison operation if and only if the numbers being compared have the same sign. Otherwise the sense of the comparison will be inverted, with all negative values being taken as being larger than all positive values.

The ones' complement of a binary number is the value obtained by inverting (flipping) all the bits in the binary representation of the number. The name "ones' complement" refers to the fact that such an inverted value, if added to the original, would always produce an "all ones" number. This mathematical operation is primarily of interest in computer science, where it has varying effects depending on how a specific computer represents numbers.

<span class="mw-page-title-main">Binary angular measurement</span>

Binary angular measurement (BAM) is a measure of angles using binary numbers and fixed-point arithmetic, in which a half turn is represented by the value 1. The unit of angular measure used in those methods may be called binary radian (brad) or binary degree.

References

  1. RFC   1323: "TCP Extensions for High Performance", section 4.2.