Off-by-one error

Last updated

An off-by-one error or off-by-one bug (known by acronyms OBOE, OBO, OB1 and OBOB) is a logic error that involves a number that differs from its intended value by 1. An off-by-one error can sometimes appear in a mathematical context. It often occurs in computer programming when a loop iterates one time too many or too few, usually caused by the use of non-strict inequality (≤) as the terminating condition where strict inequality (<) should have been used, or vice versa. Off-by-one errors also stem from confusion over zero-based numbering.

Contents

Cases

Looping over arrays

Consider an array of items, and items m through n (inclusive) are to be processed. How many items are there? An intuitive answer may be nm, but that is off by one, exhibiting a fencepost error; the correct answer is nm + 1.

For this reason, ranges in computing are often represented by half-open intervals; the range from m to n (inclusive) is represented by the range from m (inclusive) to n + 1 (exclusive) to avoid fencepost errors. For example, a loop that iterates five times (from 0 to 4 inclusive) can be written as a half-open interval from 0 to 5:

for(index=0;index<5;index++){/* Body of the loop */}

The loop body is executed first of all with index equal to 0; index then becomes 1, 2, 3, and finally 4 on successive iterations. At that point, index becomes 5, so index < 5 is false and the loop ends. However, if the comparison used were <= (less than or equal to), the loop would be carried out six times: index takes the values 0, 1, 2, 3, 4, and 5. Likewise, if index were initialized to 1 rather than 0, there would only be four iterations: index takes the values 1, 2, 3, and 4. Both of these alternatives can cause off-by-one errors.

Another such error can occur if a do-while loop is used in place of a while loop (or vice versa.) A do-while loop is guaranteed to run at least once.

Array-related confusion may also result from differences in programming languages. Numbering from 0 is most common, but some languages start array numbering with 1. Pascal has arrays with user-defined indices. This makes it possible to model the array indices after the problem domain.

Fencepost error

A fencepost error (occasionally called a telegraph pole,lamp-post, or picket fence error) is a specific type of off-by-one error. An early description of this error appears in the works of Vitruvius. [1] The following problem illustrates the error:

If you build a straight fence 30 feet long with posts spaced 3 feet apart, how many posts do you need?

A straight fence with 10 sections requires 11 posts. More generally, n sections would require n + 1 posts. Fencepost error 01.svg
A straight fence with 10 sections requires 11 posts. More generally, n sections would require n + 1 posts.

The common answer of 10 posts is wrong. This response comes from dividing the length of the fence by the spacing apart from each post, with the quotient being erroneously classified as the number of posts. In actuality, the fence has 10 sections and 11 posts.

In this scenario, a fence with n sections will have n + 1 posts. Conversely, if the fence contains n posts, it will contain n − 1 sections. This relationship is important to consider when dealing with the reverse error. The reverse error occurs when the number of posts is known and the number of sections is assumed to be the same. Depending on the design of the fence, this assumption can be correct or incorrect.

The following problem demonstrates the reverse error:

If you have n posts, how many sections are there between them?

The interpretation for the fence's design changes the answer to this problem. The correct number of sections for a fence is n − 1 if the fence is a free-standing line segment bounded by a post at each of its ends (e.g., a fence between two passageway gaps), n if the fence forms one complete, free-standing loop (e.g., enclosure accessible by surmounting, such as a boxing ring), or n + 1 if posts do not occur at the ends of a line-segment-like fence (e.g., a fence between and wall-anchored to two buildings). The precise problem definition must be carefully considered, as the setup for one situation may give the wrong answer for other situations.

Fencepost errors can also occur in units other than length. For example, the Time Pyramid, consisting of 120 blocks placed at 10-year intervals between blocks, is scheduled to take 1,190 years to build (not 1,200), from the installation of the first block to the last block. One of the earliest fencepost errors involved time, where the Julian calendar originally calculated leap years incorrectly, due to counting inclusively rather than exclusively, yielding a leap year every three years rather than every four.

In larger numbers, being off by one is often not a major issue. In smaller numbers, however, and specific cases where accuracy is paramount, committing an off-by-one error can be disastrous. Sometimes such an issue will also be repeated and, therefore, worsened, by someone passing on an incorrect calculation, if the following person makes the same kind of mistake again (of course, the error might also be reversed).

An example of this error can occur in the computational language MATLAB with the linspace() linear interpolation function, whose parameters are (lower value, upper value, number of values) and not (lower value, upper value, number of increments). A programmer who misunderstands the third parameter to be the number of increments might hope that linspace(0,10,5) would achieve a sequence [0, 2, 4, 6, 8, 10] but instead would get [0, 2.5, 5, 7.5, 10].

Security implications

A common off-by-one error which results in a security-related bug is caused by misuse of the C standard library strncat routine. A common misconception with strncat is that the guaranteed null termination will not write beyond the maximum length. In reality it will write a terminating null character one byte beyond the maximum length specified. The following code contains such a bug:

voidfoo(char*s){charbuf[15];memset(buf,0,sizeof(buf));strncat(buf,s,sizeof(buf));// Final parameter should be: sizeof(buf)-1}

Off-by-one errors are common in using the C library because it is not consistent with respect to whether one needs to subtract 1 byte – functions like fgets() and strncpy will never write past the length given them (fgets() subtracts 1 itself, and only retrieves (length − 1) bytes), whereas others, like strncat will write past the length given them. So the programmer has to remember for which functions they need to subtract 1.

On some systems (little endian architectures in particular) this can result in the overwriting of the least significant byte of the frame pointer. This can cause an exploitable condition where an attacker can hijack the local variables for the calling routine.

One approach that often helps avoid such problems is to use variants of these functions that calculate how much to write based on the total length of the buffer, rather than the maximum number of characters to write. Such functions include strlcat and strlcpy , and are often considered "safer" because they make it easier to avoid accidentally writing past the end of a buffer. (In the code example above, calling strlcat(buf, s, sizeof(buf)) instead would remove the bug.)

See also

Related Research Articles

<span class="mw-page-title-main">Binary search</span> Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

In computer programming, an iterator is an object that progressively provides access to each item of a collection, in order.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

<span class="mw-page-title-main">For loop</span> Control flow statement for repeated execution

In computer science, a for-loop or for loop is a control flow statement for specifying iteration. Specifically, a for-loop functions by running a section of code repeatedly until a certain condition has been satisfied.

<span class="mw-page-title-main">Foreach loop</span> Control flow statement for traversing items in a collection

In computer programming, foreach loop is a control flow statement for traversing items in a collection. foreach is usually used in place of a standard for loop statement. Unlike other for loop constructs, however, foreach loops usually maintain no explicit counter: they essentially say "do this to everything in this set", rather than "do this x times". This avoids potential off-by-one errors and makes code simpler to read. In object-oriented languages, an iterator, even if implicit, is often used as the means of traversal.

Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions in that architecture identify the operand(s) of each instruction. An addressing mode specifies how to calculate the effective memory address of an operand by using information held in registers and/or constants contained within a machine instruction or elsewhere.

Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation can be undertaken manually by the programmer or by an optimizing compiler. On modern processors, loop unrolling is often counterproductive, as the increased code size can cause more cache misses; cf. Duff's device.

A bit array is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level parallelism in hardware to perform operations quickly. A typical bit array stores kw bits, where w is the number of bits in the unit of storage, such as a byte or word, and k is some nonnegative integer. If w does not divide the number of bits to be stored, some space is wasted due to internal fragmentation.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

<span class="mw-page-title-main">JavaScript syntax</span> Set of rules defining correctly structured programs

The syntax of JavaScript is the set of rules that define a correctly structured JavaScript program.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

setjmp.h is a header defined in the C standard library to provide "non-local jumps": control flow that deviates from the usual subroutine call and return sequence. The complementary functions setjmp and longjmp provide this functionality.

The polyhedral model is a mathematical framework for programs that perform large numbers of operations -- too large to be explicitly enumerated -- thereby requiring a compact representation. Nested loop programs are the typical, but not the only example, and the most common use of the model is for loop nest optimization in program optimization. The polyhedral method treats each loop iteration within nested loops as lattice points inside mathematical objects called polyhedra, performs affine transformations or more general non-affine transformations such as tiling on the polytopes, and then converts the transformed polytopes into equivalent, but optimized, loop nests through polyhedra scanning.

<span class="mw-page-title-main">Fisher–Yates shuffle</span> Algorithm for generating a random permutation of a finite set

The Fisher–Yates shuffle is an algorithm for shuffling a finite sequence. The algorithm takes a list of all the elements of the sequence, and continually determines the next element in the shuffled sequence by randomly drawing an element from the list until no elements remain. The algorithm produces an unbiased permutation: every permutation is equally likely. The modern version of the algorithm takes time proportional to the number of items being shuffled and shuffles them in place.

This article compares a large number of programming languages by tabulating their data types, their expression, statement, and declaration syntax, and some common operating-system interfaces.

<span class="mw-page-title-main">Computation of cyclic redundancy checks</span> Overview of the computation of cyclic redundancy checks

Computation of a cyclic redundancy check is derived from the mathematics of polynomial division, modulo two. In practice, it resembles long division of the binary message string, with a fixed number of zeroes appended, by the "generator polynomial" string except that exclusive or operations replace subtractions. Division of this type is efficiently realised in hardware by a modified shift register, and in software by a series of equivalent algorithms, starting with simple code close to the mathematics and becoming faster through byte-wise parallelism and space–time tradeoffs.

<span class="mw-page-title-main">Control table</span> Data structures that control the execution order of computer commands

Control tables are tables that control the control flow or play a major part in program control. There are no rigid rules about the structure or content of a control table—its qualifying attribute is its ability to direct control flow in some way through "execution" by a processor or interpreter. The design of such tables is sometimes referred to as table-driven design. In some cases, control tables can be specific implementations of finite-state-machine-based automata-based programming. If there are several hierarchical levels of control table they may behave in a manner equivalent to UML state machines

The C programming language has a set of functions implementing operations on strings in its standard library. Various operations, such as copying, concatenation, tokenization and searching are supported. For character strings, the standard library uses the convention that strings are null-terminated: a string of n characters is represented as an array of n + 1 elements, the last of which is a "NUL character" with numeric value 0.

References

Citations

  1. Moniot, Robert K., Who first described the "fence-post error?", Fordham University, archived from the original on 2016-03-05, retrieved 2016-07-07.

Sources

Further reading