Undefined value

Last updated December 10, 2021

In computing (particularly, in programming), undefined value is a condition where an expression does not have a correct value, although it is syntactically correct. An undefined value must not be confused with empty string, Boolean "false" or other "empty" (but defined) values. Depending on circumstances, evaluation to an undefined value may lead to exception or undefined behaviour, but in some programming languages undefined values can occur during a normal, predictable course of program execution.

Dynamically typed languages usually treat undefined values explicitly when possible. For instance, Perl has undef operator^[1] which can "assign" such value to a variable. In other type systems an undefined value can mean an unknown, unpredictable value, or merely a program failure on attempt of its evaluation. Nullable types offer an intermediate approach; see below.

Handling

The value of a partial function is undefined when its argument is out of its domain of definition. This include numerous arithmetical cases such as division by zero, square root or logarithm of a negative number etc. Another common example is accessing an array with an index which is out of bounds, as is the value in an associative array for a key which it does not contain. There are various ways that these situations are handled in practice:

Reserved value

In applications where undefined values must be handled gracefully, it is common to reserve a special null value which is distinguishable from normal values. This resolves the difficulty by creating a defined value to represent the formerly undefined case. There are many examples of this:

The C standard I/O library reserves the special value EOF to indicate that no more input is available. The getchar() function returns the next available input character, or EOF if there is no more available. (The ASCII character code defines a null character for this purpose, but the standard I/O library wishes to be able to send and receive null characters, so it defines a separate EOF value.)
The IEEE 754 floating-point arithmetic standard defines a special "not a number" value which is returned when an arithmetic operation has no defined value. Examples are division by zero, or the square root or logarithm of a negative number.
Structured Query Language has a special NULL value to indicate missing data.
The Perl language lets the definedness of an expression be checked via the defined() predicate.^[2]
Many programming languages support the concept of a null pointer distinct from any valid pointer, and often used as an error return.
Some languages allow most types to be nullable, for example C#.^[3]
Most Unix system calls return the special value −1 to indicate failure.

While dynamically typed languages often ensure that uninitialized variables default to a null value, statically typed values often do not, and distinguish null values (which are well-defined) from uninitialized values (which are not).^[3]

Exception handling

Some programming languages have a concept of exception handling for dealing with failure to return a value. The function returns in a defined way, but it does not return a value, so there is no need to invent a special value to return.

A variation on this is signal handling, which is done at the operating system level and not integrated into a programming language. Signal handlers can attempt some forms of recovery, such as terminating part of a computation, but without as much flexibility as fully integrated exception handling.

Non-returning functions

A function which never returns has an undefined value because the value can never be observed. Such functions are formally assigned the bottom type, which has no values. Examples fall into two categories:

Functions which loop forever. This may arise deliberately, or as a result of a search for something which will never be found. (For example, in the case of failed μ operator in a partial recursive function.)
Functions which terminate the computation, such as the exit system call. From within the program, this is indistinguishable from the preceding case, but it makes a difference to the invoker of the program.

Undefined behaviour

All of the preceding methods of handling undefined values require that the undefinedness be detected. That is, the called function determines that it cannot return a normal result and takes some action to notify the caller. At the other end of the spectrum, undefined behaviour places the onus on the caller to avoid calling a function with arguments outside of its domain. There is no limit on what might happen. At best, an easily detectable crash; at worst, a subtle error in a seemingly unrelated computation.

(The formal definition of "undefined behaviour" includes even more extreme possibilities, including things like "halt and catch fire" and "make demons fly out of your nose".^[4])

The classic example is a dangling pointer reference. It is very fast to dereference a valid pointer, but can be very complex to determine if a pointer is valid. Therefore, computer hardware and low-level languages such as C do not attempt to validate pointers before dereferencing them, instead passing responsibility to the programmer. This offers speed at the expense of safety.

Undefined value sensu stricto

The strict definition of an undefined value is a superficially valid (non-null) output which is meaningless but does not trigger undefined behaviour. For example, passing a negative number to the fast inverse square root function will produce a number. Not a very useful number, but the computation will complete and return something.

Undefined values occur particularly often in hardware. If a wire is not carrying useful information, it still exists and has some voltage level. The voltage should not be abnormal (e.g. not a damaging overvoltage), but the particular logic level is unimportant.

The same situation occurs in software when a data buffer is provided but not completely filled. For example, the C library strftime function converts a timestamp to human-readable form in a supplied output buffer. If the output buffer is not large enough to hold the result, an error is returned and the buffer's contents are undefined.

In the other direction, the open system call in POSIX takes three arguments: a file name, some flags, and a file mode. The file mode is only used if the flags include O_CREAT. It is common to use a two-argument form of open, which provides an undefined value for the file mode, when O_CREAT is omitted.

Sometimes it is useful to work with such undefined values in a limited way. The overall computation can still be well-defined if the undefined value is later ignored.

As an example of this, the C language permits converting a pointer to an integer, although the numerical value of that integer is undefined. It may still be useful for debugging, for comparing two pointers for equality, or for creating an XOR linked list.

Safely handling undefined values is important in optimistic concurrency control systems, which detect race conditions after the fact. For example, reading a shared variable protected by seqlock will produce an undefined value before determining that a race condition happened. It will then discard the undefined data and retry the operation. This produces a defined result as long as the operations performed on the undefined values do not produce full-fledged undefined behaviour.

Other examples of undefined values being useful are random number generators and hash functions. The specific values returned are undefined, but they have well-defined properties and may be used without error.

Notation

In computability theory, undefinedness of an expression is denoted as expr↑, and definedness as expr↓.

Related Research Articles

C is a general-purpose, procedural computer programming language supporting structured programming, lexical variable scope, and recursion, with a static type system. By design, C provides constructs that map efficiently to typical machine instructions. It has found lasting use in applications previously coded in assembly language. Such applications include operating systems and various application software for computer architectures that range from supercomputers to PLCs and embedded systems.

The Cyclone programming language is intended to be a safe dialect of the C language. Cyclone is designed to avoid buffer overflows and other vulnerabilities that are possible in C programs, without losing the power and convenience of C as a tool for system programming.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser that reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar. The generated parsers are portable: they do not require any specific compilers. Bison by default generates LALR(1) parsers but it can also generate canonical LR, IELR(1) and GLR parsers.

In computer science, a reference is a value that enables a program to indirectly access a particular data, such as a variable's value or a record, in the computer's memory or in some other storage device. The reference is said to refer to the datum, and accessing the datum is called dereferencing the reference.

The syntax of the C programming language is the set of rules governing writing of software in the C language. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

Pointer (computer programming) Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the platform.

In computer programming, ?: is a ternary operator that is part of the syntax for basic conditional expressions in several programming languages. It is commonly referred to as the conditional operator, inline if (iif), or ternary if. An expression a ? b : c evaluates to b if the value of a is true, and otherwise to c. One can read it aloud as "if a then b otherwise c".

In the C++ programming language, a reference is a simple reference datatype that is less powerful but safer than the pointer type inherited from C. The name C++ reference may cause confusion, as in computer science a reference is a general concept datatype, with pointers and C++ references being specific reference datatype implementations. The definition of a reference in C++ is such that it does not need to exist. It can be implemented as a new name for an existing object.

In computing, an uninitialized variable is a variable that is declared but is not set to a definite known value before it is used. It will have some value, but not a predictable one. As such, it is a programming error and a common source of bugs in software.

In computing, a null pointer or null reference is a value saved for indicating that the pointer or reference does not refer to a valid object. Programs routinely use null pointers to represent conditions such as the end of a list of unknown length or the failure to perform some action; this use of null pointers can be compared to nullable types and to the Nothing value in an option type.

In computer programming, an automatic variable is a local variable which is allocated and deallocated automatically when program flow enters and leaves the variable's scope. The scope is the lexical context, particularly the function or block in which a variable is defined. Local data is typically invisible outside the function or lexical context where it is defined. Local data is also invisible and inaccessible to a called function, but is not deallocated, coming back in scope as the execution thread returns to the caller.

For most file systems, a program initializes access to a file in a file system using the open system call. This allocates resources associated to the file, and returns a handle that the process will use to refer to that file. In some cases the open is performed by the first access.

In computer science, the syntax of a computer language is the set of rules that defines the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

In type theory, a theory within mathematical logic, the bottom type is the type that has no values. It is also called the zero or empty type, and is sometimes denoted with the up tack (⊥) symbol.

In computer programming, a semipredicate problem occurs when a subroutine intended to return a useful value can fail, but the signalling of failure uses an otherwise valid return value. The problem is that the caller of the subroutine cannot tell what the result means in this case.

Nullable types are a feature of some programming languages which allow the value to be set to the special value NULL instead of the usual possible values of the data type. In statically typed languages, a nullable type is an option type, while in dynamically typed languages, equivalent behavior is provided by having a single null value.

In computer science, a three-way comparison takes two values A and B belonging to a type with a total order and determines whether A < B, A = B, or A > B in a single operation, in accordance with the mathematical law of trichotomy.

The null coalescing operator is a binary operator that is part of the syntax for a basic conditional expression in several programming languages, including C#, PowerShell as of version 7.0.0, Perl as of version 5.10, Swift, and PHP 7.0.0. While its behavior differs between implementations, the null coalescing operator generally returns the result of its left-most operand if it exists and is not null, and otherwise returns the right-most operand. This behavior allows a default value to be defined for cases where a more specific value is not available.

References

↑ "undef". Perl 5 documentation. 2009-09-25. Retrieved 2010-03-26.
↑ "defined". Perl 5 documentation. 2009-09-25. Retrieved 2010-03-26.
1 2 Carr, Richard (2006-10-01). "C# Nullable Numeric Data Types". C# Fundamentals tutorial. Retrieved 2010-03-27.
↑ "Nasal demons". Jargon File . Retrieved 2014-06-12.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Perl_2009_Undef-1] "undef". Perl 5 documentation. 2009-09-25. Retrieved 2010-03-26.

[Perl_2009_Defined-2] "defined". Perl 5 documentation. 2009-09-25. Retrieved 2010-03-26.

[Carr_2006-3] 1 2 Carr, Richard (2006-10-01). "C# Nullable Numeric Data Types". C# Fundamentals tutorial. Retrieved 2010-03-27.

[Catb-4] "Nasal demons". Jargon File . Retrieved 2014-06-12.

[1]

[2]

[3]

[4]