Sentinel value

Last updated

In computer programming, a sentinel value (also referred to as a flag value, trip value, rogue value, signal value, or dummy data) is a special value in the context of an algorithm which uses its presence as a condition of termination, typically in a loop or recursive algorithm.

Contents

The sentinel value is a form of in-band data that makes it possible to detect the end of the data when no out-of-band data (such as an explicit size indication) is provided. The value should be selected in such a way that it is guaranteed to be distinct from all legal data values since otherwise, the presence of such values would prematurely signal the end of the data (the semipredicate problem). A sentinel value is sometimes known as an "Elephant in Cairo," due to a joke where this is used as a physical sentinel. In safe languages, most sentinel values could be replaced with option types, which enforce explicit handling of the exceptional case.

Examples

Some examples of common sentinel values and their uses:

Variants

A related practice, used in slightly different circumstances, is to place some specific value at the end of the data, in order to avoid the need for an explicit test for termination in some processing loop, because the value will trigger termination by the tests already present for other reasons. Unlike the above uses, this is not how the data is naturally stored or processed, but is instead an optimization, compared to the straightforward algorithm that checks for termination. This is typically used in searching. [1] [2]

For instance, when searching for a particular value in an unsorted list, every element will be compared against this value, with the loop terminating when equality is found; however, to deal with the case that the value should be absent, one must also test after each step for having completed the search unsuccessfully. By appending the value searched for to the end of the list, an unsuccessful search is no longer possible, and no explicit termination test is required in the inner loop. After the search, one must decide whether a true match was found, but this test needs to be performed only once rather than at each iteration. [3] Knuth calls the value so placed at the end of the data, a dummy value rather than a sentinel.

Examples

Array

For example, if searching for a value in an array in C, a straightforward implementation is as follows; note the use of a negative number (invalid index) to solve the semipredicate problem of returning "no result":

intfind(intarr[],size_tlen,intval){for(inti=0;i<len;i++)if(arr[i]==val)returni;return-1;// not found}

However, this does two tests at each iteration of the loop: whether the value has been found and whether the end of the array has been reached. This latter test is what is avoided by using a sentinel value. Assuming the array can be extended by one element (without memory allocation or cleanup; this is more realistic for a linked list, as below), this can be rewritten as:

intfind(intarr[],size_tlen,intval){inti;arr[len]=val;// add sentinel valuefor(i=0;;i++)if(arr[i]==val)break;if(i<len)returni;elsereturn-1;// not found}

The test for i < len is still present, but it has been moved outside the loop, which now contains only a single test (for the value), and is guaranteed to terminate due to the sentinel value. There is a single check on termination if the sentinel value has been hit, which replaces a test for each iteration.

It is also possible to temporarily replace the last element of the array by a sentinel and handle it, especially if it is reached:

intfind(intarr[],size_tlen,intval){intlast;if(len==0)return-1;last=arr[len-1];arr[len-1]=val;// add sentinel valueinti;for(i=0;;i++)if(arr[i]==val)break;arr[len-1]=last;if(arr[i]==val)returni;elsereturn-1;// not found}

See also

Related Research Articles

<span class="mw-page-title-main">Binary search algorithm</span> Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

<span class="mw-page-title-main">Insertion sort</span> Sorting algorithm

Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages:

In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which together represent a sequence. In its most basic form, each node contains data, and a reference to the next node in the sequence. This structure allows for efficient insertion or removal of elements from any position in the sequence during iteration. More complex variants add additional links, allowing more efficient insertion or removal of nodes at arbitrary positions. A drawback of linked lists is that data access time is linear in respect to the number of nodes in the list. Because nodes are serially linked, accessing any node requires that the prior node be accessed beforehand. Faster access, such as random access, is not feasible. Arrays have better cache locality compared to linked lists.

In computer science, linear search or sequential search is a method for finding an element within a list. It sequentially checks each element of the list until a match is found or the whole list has been searched.

In computer science, selection sort is an in-place comparison sorting algorithm. It has an O(n2) time complexity, which makes it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is noted for its simplicity and has performance advantages over more complicated algorithms in certain situations, particularly where auxiliary memory is limited.

In computer science, control flow is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an imperative programming language from a declarative programming language.

<span class="mw-page-title-main">Interpolation search</span> Searching algorithm

Interpolation search is an algorithm for searching for a key in an array that has been ordered by numerical values assigned to the keys. It was first described by W. W. Peterson in 1957. Interpolation search resembles the method by which people search a telephone directory for a name : in each step the algorithm calculates where in the remaining search space the sought item might be, based on the key values at the bounds of the search space and the value of the sought key, usually via a linear interpolation. The key value actually found at this estimated position is then compared to the key value being sought. If it is not equal, then depending on the comparison, the remaining search space is reduced to the part before or after the estimated position. This method will only work if calculations on the size of differences between key values are sensible.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

<span class="mw-page-title-main">For loop</span> Control flow statement for repeated execution

In computer science a for-loop or for loop is a control flow statement for specifying iteration. Specifically, a for loop functions by running a section of code repeatedly until a certain condition has been satisfied.

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the platform.

<span class="mw-page-title-main">Foreach loop</span> Control flow statement for traversing items in a collection

In computer programming, foreach loop is a control flow statement for traversing items in a collection. foreach is usually used in place of a standard for loop statement. Unlike other for loop constructs, however, foreach loops usually maintain no explicit counter: they essentially say "do this to everything in this set", rather than "do this x times". This avoids potential off-by-one errors and makes code simpler to read. In object-oriented languages, an iterator, even if implicit, is often used as the means of traversal.

In computer science, the Boyer–Moore string-search algorithm is an efficient string-searching algorithm that is the standard benchmark for practical string-search literature. It was developed by Robert S. Boyer and J Strother Moore in 1977. The original paper contained static tables for computing the pattern shifts without an explanation of how to produce them. The algorithm for producing the tables was published in a follow-on paper; this paper contained errors which were later corrected by Wojciech Rytter in 1980.

In computer science, corecursion is a type of operation that is dual to recursion. Whereas recursion works analytically, starting on data further from a base case and breaking it down into smaller data and repeating until one reaches a base case, corecursion works synthetically, starting from a base case and building it up, iteratively producing data further removed from a base case. Put simply, corecursive algorithms use the data that they themselves produce, bit by bit, as they become available, and needed, to produce further bits of data. A similar but distinct concept is generative recursion which may lack a definite "direction" inherent in corecursion and recursion.

The bitap algorithm is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is "approximately equal" to a given pattern, where approximate equality is defined in terms of Levenshtein distance – if the substring and pattern are within a given distance k of each other, then the algorithm considers them equal. The algorithm begins by precomputing a set of bitmasks containing one bit for each element of the pattern. Then it is able to do most of the work with bitwise operations, which are extremely fast.

In computer programming, a sentinel node is a specifically designated node used with linked lists and trees as a traversal path terminator. This type of node does not hold or reference any data managed by the data structure.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In computer programming, a semipredicate problem occurs when a subroutine intended to return a useful value can fail, but the signalling of failure uses an otherwise valid return value. The problem is that the caller of the subroutine cannot tell what the result means in this case.

An American flag sort is an efficient, in-place variant of radix sort that distributes items into buckets. Non-comparative sorting algorithms such as radix sort and American flag sort are typically used to sort large objects such as strings, for which comparison is not a unit-time operation. American flag sort iterates through the bits of the objects, considering several bits of each object at a time. For each set of bits, American flag sort makes two passes through the array of objects: first to count the number of objects that will fall in each bin, and second to place each object in its bucket. This works especially well when sorting a byte at a time, using 256 buckets. With some optimizations, it is twice as fast as quicksort for large sets of strings.

<span class="mw-page-title-main">Computation of cyclic redundancy checks</span> Overview of the computation of cyclic redundancy checks

Computation of a cyclic redundancy check is derived from the mathematics of polynomial division, modulo two. In practice, it resembles long division of the binary message string, with a fixed number of zeroes appended, by the "generator polynomial" string except that exclusive or operations replace subtractions. Division of this type is efficiently realised in hardware by a modified shift register, and in software by a series of equivalent algorithms, starting with simple code close to the mathematics and becoming faster through byte-wise parallelism and space–time tradeoffs.

References

  1. Mehlhorn, Kurt; Sanders, Peter (2008). "3. Representing Sequences by Arrays and Linked Lists" (PDF). Algorithms and Data Structures: The Basic Toolbox. Springer. p. 63. ISBN   978-3-540-77977-3.
  2. McConnell, Steve (2004). Code Complete (2nd ed.). Redmond: Microsoft Press. p. 621. ISBN   0-7356-1967-0.
  3. Knuth, Donald (1973). Sorting and searching. The Art of Computer Programming. Vol. 3. Addison-Wesley. p. 395. ISBN   0-201-03803-X.