Semipredicate problem

Last updated

In computer programming, a semipredicate problem occurs when a subroutine intended to return a useful value can fail, but the signalling of failure uses an otherwise valid return value. [1] The problem is that the caller of the subroutine cannot tell what the result means in this case.

Contents

Example

The division operation yields a real number, but fails when the divisor is zero. If we were to write a function that performs division, we might choose to return 0 on this invalid input. However, if the dividend is 0, the result is 0 too. This means there is no number we can return to uniquely signal attempted division by zero, since all real numbers are in the range of division.

Practical implications

Early programmers handled potentially exceptional cases such as division using a convention requiring the calling routine to verify the inputs before calling the division function. This had two problems: first, it greatly encumbered all code that performed division (a very common operation); second, it violated the Don't repeat yourself and encapsulation principles, the former of which suggesting eliminating duplicated code, and the latter suggesting that data-associated code be contained in one place (in this division example, the verification of input was done separately). For a computation more complicated than division, it could be difficult for the caller to recognize invalid input; in some cases, determining input validity may be as costly as performing the entire computation. The target function could also be modified and would then expect different preconditions than would the caller; such a modification would require changes in every place where the function was called.

Solutions

The semipredicate problem is not universal among functions that can fail.

Using a custom convention to interpret return values

If the range of a function does not cover the entire space corresponding to the data type of the function's return value, a value known to be impossible under normal computation can be used. For example, consider the function index, which takes a string and a substring, and returns the integer index of the substring in the main string. If the search fails, the function may be programmed to return -1 (or any other negative value), since this can never signify a successful result.

This solution has its problems, though, as it overloads the natural meaning of a function with an arbitrary convention:

Multivalued return

Many languages allow, through one mechanism or another, a function to return multiple values. If this is available, the function can be redesigned to return a boolean value signalling success or failure, along with its primary return value. If multiple error modes are possible, the function may instead return an enumerated return code (error code) along with its primary return value.

Various techniques for returning multiple values include:

Global variable for return status

Similar to an "out" argument, a global variable can store what error occurred (or simply whether an error occurred).

For instance, if an error occurs, and is signalled (generally as above, by an illegal value like −1) the Unix errno variable is set to indicate which value occurred. Using a global has its usual drawbacks: thread safety becomes a concern (modern operating systems use a thread-safe version of errno), and if only one error global is used, its type must be wide enough to contain all interesting information about all possible errors in the system.

Exceptions

Exceptions are one widely used scheme for solving this problem. An error condition is not considered a return value of the function at all; normal control flow is disrupted and explicit handling of the error takes place automatically. They are an example of out-of-band signalling.

Expanding the return value type

Manually created hybrid types

In C, a common approach, when possible, is to use a data type deliberately wider than strictly needed by the function. For example, the standard function getchar() is defined with return type int and returns a value in the range [0,255] (the range of unsigned char) on success or the value EOF (implementation-defined, but outside the range of unsigned char) on the end of the input or a read error.

Nullable reference types

In languages with pointers or references, one solution is to return a pointer to a value, rather than the value itself. This return pointer can then be set to null to indicate an error. It is typically suited to functions that return a pointer anyway. This has a performance advantage over the OOP style of exception handling, [4] with the drawback that negligent programmers may not check the return value, resulting in a crash when the invalid pointer is used. Whether a pointer is null or not is another example of the predicate problem; null may be a flag indicating failure or the value of a pointer returned successfully. A common pattern in the UNIX environment is setting a separate variable to indicate the cause of an error. An example of this is the C standard library fopen() function.

Implicitly hybrid types

In dynamically-typed languages, such as PHP and Lisp, the usual approach is to return false, none, or null when the function call fails. This works by returning a different type to the normal return type (thus expanding the type). It is a dynamically-typed equivalent to returning a null pointer.

For example, a numeric function normally returns a number (int or float), and while zero might be a valid response; false is not. Similarly, a function that normally returns a string might sometimes return the empty string as a valid response, but return false on failure. This process of type-juggling necessitates care in testing the return value: e.g., in PHP, use === (i.e., equal and of same type) rather than just == (i.e., equal, after automatic type-conversion). It works only when the original function is not meant to return a boolean value, and still requires that information about the error be conveyed via other means.

Explicitly hybrid types

In Haskell and other functional programming languages, it is common to use a data type that is just as big as it needs to be to express any possible result. For example, one can write a division function that returned the type Maybe Real, and a getchar function returning Either String Char. The first is an option type, which has only one failure value, Nothing. The second case is a tagged union: a result is either some string with a descriptive error message, or a successfully read character. Haskell's type inference system helps ensure that callers deal with possible errors. Since the error conditions become explicit in the function type, looking at its signature immediately tells the programmer how to treat errors. Further, tagged unions and option types form monads when endowed with appropriate functions: this may be used to keep the code tidy by automatically propagating unhandled error conditions.

Example

Rust has algebraic data types and comes with the built-in Result<T, E> and Option<T> types.

fnfind(key: String)-> Option<String>{ifkey=="hello"{Some(key)}else{None}}

The C++ programming language introduced std::optional<T> in the C++17 update.

std::optional<int>find_int_in_str(std::string_viewstr){constexprautodigits="0123456789";auton=str.find_first_of(digits);if(n==std::string::npos){// The string simply contains no numbers, not necessarily an errorreturnstd::nullopt;}intresult;// More search logic that sets 'result'returnresult;}

and std::expected<T, E> in the C++23 update

enumclassparse_error{kEmptyString,kOutOfRange,kNotANumber};std::expected<int,parse_error>parse_number(std::string_viewstr){if(str.empty()){// Flag one unexpected situation out of severalreturnstd::unexpected(ParseError::kEmptyString);}intresult;// More conversion logic that sets 'result'returnresult;}

See also

Related Research Articles

In object-oriented (OO) and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

In computer programming, standard streams are interconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin), standard output (stdout) and standard error (stderr). Originally I/O happened via a physically connected system console, but standard streams abstract this. When a command is executed via an interactive shell, the streams are typically connected to the text terminal on which the shell is running, but can be changed with redirection or a pipeline. More generally, a child process inherits the standard streams of its parent process.

In computer programming, a parameter or a formal argument is a special kind of variable used in a subroutine to refer to one of the pieces of data provided as input to the subroutine. These pieces of data are the values of the arguments with which the subroutine is going to be called/invoked. An ordered list of parameters is usually included in the definition of a subroutine, so that, each time the subroutine is called, its arguments for that call are evaluated, and the resulting values can be assigned to the corresponding parameters.

In computer programming, a default argument is an argument to a function that a programmer is not required to specify. In most programming languages, functions may take one or more arguments. Usually, each argument must be specified in full. Later languages allow the programmer to specify default arguments that always have a value, even if one is not specified when calling the function.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in the C language. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the platform.

In functional programming, a monad is a structure that combines program fragments (functions) and wraps their return values in a type with additional computation. In addition to defining a wrapping monadic type, monads define two operators: one to wrap a value in the monad type, and another to compose together functions that output values of the monad type. General-purpose languages use monads to reduce boilerplate code needed for common operations. Functional languages use monads to turn complicated sequences of functions into succinct pipelines that abstract away control flow, and side-effects.

In computer programming, the ternary conditional operator is a ternary operator that is part of the syntax for basic conditional expressions in several programming languages. It is commonly referred to as the conditional operator, ternary if, or inline if. An expression a ? b : c evaluates to b if the value of a is true, and otherwise to c. One can read it aloud as "if a then b otherwise c". The form a ? b : c is by far and large the most common, but alternative syntaxes do exist; for example, Raku uses the syntax a ?? b !! c to avoid confusion with the infix operators ? and !, whereas in Visual Basic .NET, it instead takes the form If(a, b, c).

In the C++ programming language, a reference is a simple reference datatype that is less powerful but safer than the pointer type inherited from C. The name C++ reference may cause confusion, as in computer science a reference is a general concept datatype, with pointers and C++ references being specific reference datatype implementations. The definition of a reference in C++ is such that it does not need to exist. It can be implemented as a new name for an existing object.

In some programming languages, const is a type qualifier that indicates that the data is read-only. While this can be used to declare constants, const in the C family of languages differs from similar constructs in other languages in being part of the type, and thus has complicated behavior when combined with pointers, references, composite data types, and type-checking. In other languages, the data is not in a single memory location, but copied at compile time on each use. Languages which use it include C, C++, D, JavaScript, Julia, and Rust.

typedef is a reserved keyword in the programming languages C, C++, and Objective-C. It is used to create an additional name (alias) for another data type, but does not create a new type, except in the obscure case of a qualified typedef of an array type where the typedef qualifiers are transferred to the array element type. As such, it is often used to simplify the syntax of declaring complex data structures consisting of struct and union types, although it is also commonly used to provide specific descriptive type names for integer data types of varying sizes.

String functions are used in computer programming languages to manipulate a string or query information about a string.

A scanf format string is a control parameter used in various functions to specify the layout of an input string. The functions can then divide the string and translate into values of appropriate data types. String scanning functions are often supplied in standard libraries. Scanf is a function that reads formatted data from the standard input string, which is usually the keyboard and writes the results whenever called in the specified arguments.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

In certain computer programming languages, data types are classified as either value types or reference types, where reference types are always implicitly accessed via references, whereas value type variables directly contain the values themselves.

This article compares a large number of programming languages by tabulating their data types, their expression, statement, and declaration syntax, and some common operating-system interfaces.

In computing, undefined value is a condition where an expression does not have a correct value, although it is syntactically correct. An undefined value must not be confused with empty string, Boolean "false" or other "empty" values. Depending on circumstances, evaluation to an undefined value may lead to exception or undefined behaviour, but in some programming languages undefined values can occur during a normal, predictable course of program execution.

C++14 is a version of the ISO/IEC 14882 standard for the C++ programming language. It is intended to be a small extension over C++11, featuring mainly bug fixes and small improvements, and was replaced by C++17. Its approval was announced on August 18, 2014. C++14 was published as ISO/IEC 14882:2014 in December 2014.

<span class="mw-page-title-main">Zig (programming language)</span> A general-purpose programming language, and toolchain to build Zig/C/C++ code

Zig is an imperative, general-purpose, statically typed, compiled system programming language designed by Andrew Kelley. It is intended to be a replacement for the C programming language, with the goals of being even smaller and simpler to program in while also offering modern features, new optimizations and a variety of safety mechanisms while not as demanding of runtime safety as seen in other languages. It is distinct from languages like Go, Rust and Carbon, which have similar goals but also target the C++ space.

References

  1. Norvig, Peter (1992). "The General Problem Solver". Paradigms of artificial intelligence programming: case studies in common LISP. Morgan Kaufmann. p. 127. ISBN   1-55860-191-0.
  2. "Built-in Types — Python 3.10.4 documentation".
  3. "If i or j is negative, the index is relative to the end of sequence s: len(s) + i or len(s) + j is substituted."common sequence operations note (3)
  4. Why Exceptions should be Exceptional - an example of performance comparison