Assertion (software development)

Last updated

In computer programming, specifically when using the imperative programming paradigm, an assertion is a predicate (a Boolean-valued function over the state space, usually expressed as a logical proposition using the variables of a program) connected to a point in the program, that always should evaluate to true at that point in code execution. Assertions can help a programmer read the code, help a compiler compile it, or help the program detect its own defects.

Contents

For the latter, some programs check assertions by actually evaluating the predicate as they run. Then, if it is not in fact true – an assertion failure – the program considers itself to be broken and typically deliberately crashes or throws an assertion failure exception.

Details

The following code contains two assertions, x > 0 and x > 1, and they are indeed true at the indicated points during execution:

x=1;assertx>0;x++;assertx>1;

Programmers can use assertions to help specify programs and to reason about program correctness. For example, a precondition—an assertion placed at the beginning of a section of code—determines the set of states under which the programmer expects the code to execute. A postcondition—placed at the end—describes the expected state at the end of execution. For example: x > 0 { x++ } x > 1.

The example above uses the notation for including assertions used by C. A. R. Hoare in his 1969 article. [1] That notation cannot be used in existing mainstream programming languages. However, programmers can include unchecked assertions using the comment feature of their programming language. For example, in C++:

x=5;x=x+1;// {x > 1}

The braces included in the comment help distinguish this use of a comment from other uses.

Libraries may provide assertion features as well. For example, in C using glibc with C99 support:

#include<assert.h>intf(void){intx=5;x=x+1;assert(x>1);}

Several modern programming languages include checked assertions – statements that are checked at runtime or sometimes statically. If an assertion evaluates to false at runtime, an assertion failure results, which typically causes execution to abort. This draws attention to the location at which the logical inconsistency is detected and can be preferable to the behaviour that would otherwise result.

The use of assertions helps the programmer design, develop, and reason about a program.

Usage

In languages such as Eiffel, assertions form part of the design process; other languages, such as C and Java, use them only to check assumptions at runtime. In both cases, they can be checked for validity at runtime but can usually also be suppressed.

Assertions in design by contract

Assertions can function as a form of documentation: they can describe the state the code expects to find before it runs (its preconditions), and the state the code expects to result in when it is finished running (postconditions); they can also specify invariants of a class. Eiffel integrates such assertions into the language and automatically extracts them to document the class. This forms an important part of the method of design by contract.

This approach is also useful in languages that do not explicitly support it: the advantage of using assertion statements rather than assertions in comments is that the program can check the assertions every time it runs; if the assertion no longer holds, an error can be reported. This prevents the code from getting out of sync with the assertions.

Assertions for run-time checking

An assertion may be used to verify that an assumption made by the programmer during the implementation of the program remains valid when the program is executed. For example, consider the following Java code:

inttotal=countNumberOfUsers();if(total%2==0){// total is even}else{// total is odd and non-negativeasserttotal%2==1;}

In Java, % is the remainder operator ( modulo ), and in Java, if its first operand is negative, the result can also be negative (unlike the modulo used in mathematics). Here, the programmer has assumed that total is non-negative, so that the remainder of a division with 2 will always be 0 or 1. The assertion makes this assumption explicit: if countNumberOfUsers does return a negative value, the program may have a bug.

A major advantage of this technique is that when an error does occur it is detected immediately and directly, rather than later through often obscure effects. Since an assertion failure usually reports the code location, one can often pin-point the error without further debugging.

Assertions are also sometimes placed at points the execution is not supposed to reach. For example, assertions could be placed at the default clause of the switch statement in languages such as C, C++, and Java. Any case which the programmer does not handle intentionally will raise an error and the program will abort rather than silently continuing in an erroneous state. In D such an assertion is added automatically when a switch statement doesn't contain a default clause.

In Java, assertions have been a part of the language since version 1.4. Assertion failures result in raising an AssertionError when the program is run with the appropriate flags, without which the assert statements are ignored. In C, they are added on by the standard header assert.h defining assert (assertion) as a macro that signals an error in the case of failure, usually terminating the program. In C++, both assert.h and cassert headers provide the assert macro.

The danger of assertions is that they may cause side effects either by changing memory data or by changing thread timing. Assertions should be implemented carefully so they cause no side effects on program code.

Assertion constructs in a language allow for easy test-driven development (TDD) without the use of a third-party library.

Assertions during the development cycle

During the development cycle, the programmer will typically run the program with assertions enabled. When an assertion failure occurs, the programmer is immediately notified of the problem. Many assertion implementations will also halt the program's execution: this is useful, since if the program continued to run after an assertion violation occurred, it might corrupt its state and make the cause of the problem more difficult to locate. Using the information provided by the assertion failure (such as the location of the failure and perhaps a stack trace, or even the full program state if the environment supports core dumps or if the program is running in a debugger), the programmer can usually fix the problem. Thus assertions provide a very powerful tool in debugging.

Assertions in production environment

When a program is deployed to production, assertions are typically turned off, to avoid any overhead or side effects they may have. In some cases assertions are completely absent from deployed code, such as in C/C++ assertions via macros. In other cases, such as Java, assertions are present in the deployed code, and can be turned on in the field for debugging. [2]

Assertions may also be used to promise the compiler that a given edge condition is not actually reachable, thereby permitting certain optimizations that would not otherwise be possible. In this case, disabling the assertions could actually reduce performance.

Static assertions

Assertions that are checked at compile time are called static assertions.

Static assertions are particularly useful in compile time template metaprogramming, but can also be used in low-level languages like C by introducing illegal code if (and only if) the assertion fails. C11 and C++11 support static assertions directly through static_assert. In earlier C versions, a static assertion can be implemented, for example, like this:

#define SASSERT(pred) switch(0){case 0:case pred:;}SASSERT(BOOLEANCONDITION);

If the (BOOLEAN CONDITION) part evaluates to false then the above code will not compile because the compiler will not allow two case labels with the same constant. The boolean expression must be a compile-time constant value, for example (sizeof(int)==4) would be a valid expression in that context. This construct does not work at file scope (i.e. not inside a function), and so it must be wrapped inside a function.

Another popular [3] way of implementing assertions in C is:

staticcharconststatic_assertion[(BOOLEANCONDITION)?1:-1]={'!'};

If the (BOOLEAN CONDITION) part evaluates to false then the above code will not compile because arrays may not have a negative length. If in fact the compiler allows a negative length then the initialization byte (the '!' part) should cause even such over-lenient compilers to complain. The boolean expression must be a compile-time constant value, for example (sizeof(int) == 4) would be a valid expression in that context.

Both of these methods require a method of constructing unique names. Modern compilers support a __COUNTER__ preprocessor define that facilitates the construction of unique names, by returning monotonically increasing numbers for each compilation unit. [4]

D provides static assertions through the use of static assert. [5]

Disabling assertions

Most languages allow assertions to be enabled or disabled globally, and sometimes independently. Assertions are often enabled during development and disabled during final testing and on release to the customer. Not checking assertions avoids the cost of evaluating the assertions while (assuming the assertions are free of side effects) still producing the same result under normal conditions. Under abnormal conditions, disabling assertion checking can mean that a program that would have aborted will continue to run. This is sometimes preferable.

Some languages, including C, YASS and C++, can completely remove assertions at compile time using the preprocessor.

Similarly, launching the Python interpreter with "-O" (for "optimize") as an argument will cause the Python code generator to not emit any bytecode for asserts. [6]

Java requires an option to be passed to the run-time engine in order to enable assertions. Absent the option, assertions are bypassed, but they always remain in the code unless optimised away by a JIT compiler at run-time or excluded at compile time via the programmer manually placing each assertion behind an if (false) clause.

Programmers can build checks into their code that are always active by bypassing or manipulating the language's normal assertion-checking mechanisms.

Comparison with error handling

Assertions are distinct from routine error-handling. Assertions document logically impossible situations and discover programming errors: if the impossible occurs, then something fundamental is clearly wrong with the program. This is distinct from error handling: most error conditions are possible, although some may be extremely unlikely to occur in practice. Using assertions as a general-purpose error handling mechanism is unwise: assertions do not allow for recovery from errors; an assertion failure will normally halt the program's execution abruptly; and assertions are often disabled in production code. Assertions also do not display a user-friendly error message.

Consider the following example of using an assertion to handle an error:

int*ptr=malloc(sizeof(int)*10);assert(ptr);// use ptr...

Here, the programmer is aware that malloc will return a NULL pointer if memory is not allocated. This is possible: the operating system does not guarantee that every call to malloc will succeed. If an out of memory error occurs the program will immediately abort. Without the assertion, the program would continue running until ptr was dereferenced, and possibly longer, depending on the specific hardware being used. So long as assertions are not disabled, an immediate exit is assured. But if a graceful failure is desired, the program has to handle the failure. For example, a server may have multiple clients, or may hold resources that will not be released cleanly, or it may have uncommitted changes to write to a datastore. In such cases it is better to fail a single transaction than to abort abruptly.

Another error is to rely on side effects of expressions used as arguments of an assertion. One should always keep in mind that assertions might not be executed at all, since their sole purpose is to verify that a condition which should always be true does in fact hold true. Consequently, if the program is considered to be error-free and released, assertions may be disabled and will no longer be evaluated.

Consider another version of the previous example:

int*ptr;// Statement below fails if malloc() returns NULL,// but is not executed at all when compiling with -NDEBUG!assert(ptr=malloc(sizeof(int)*10));// use ptr: ptr isn't initialised when compiling with -NDEBUG!...

This might look like a smart way to assign the return value of malloc to ptr and check if it is NULL in one step, but the malloc call and the assignment to ptr is a side effect of evaluating the expression that forms the assert condition. When the NDEBUG parameter is passed to the compiler, as when the program is considered to be error-free and released, the assert() statement is removed, so malloc() isn't called, rendering ptr uninitialised. This could potentially result in a segmentation fault or similar null pointer error much further down the line in program execution, causing bugs that may be sporadic and/or difficult to track down. Programmers sometimes use a similar VERIFY(X) define to alleviate this problem.

Modern compilers may issue a warning when encountering the above code. [7]

History

In 1947 reports by von Neumann and Goldstine [8] on their design for the IAS machine, they described algorithms using an early version of flow charts, in which they included assertions: "It may be true, that whenever C actually reaches a certain point in the flow diagram, one or more bound variables will necessarily possess certain specified values, or possess certain properties, or satisfy certain properties with each other. Furthermore, we may, at such a point, indicate the validity of these limitations. For this reason we will denote each area in which the validity of such limitations is being asserted, by a special box, which we call an assertion box."

The assertional method for proving correctness of programs was advocated by Alan Turing. In a talk "Checking a Large Routine" at Cambridge, June 24, 1949 Turing suggested: "How can one check a large routine in the sense of making sure that it's right? In order that the man who checks may not have too difficult a task, the programmer should make a number of definite assertions which can be checked individually, and from which the correctness of the whole program easily follows". [9]

See also

Related Research Articles

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every term. Usually the terms are various language constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term. Type systems formalize and enforce the otherwise implicit categories the programmer uses for algebraic data types, data structures, or other components.

In computing, a bus error is a fault raised by hardware, notifying an operating system (OS) that a process is trying to access memory that the CPU cannot physically address: an invalid address for the address bus, hence the name. In modern use on most architectures these are much rarer than segmentation faults, which occur primarily due to memory access violations: problems in the logical address or permissions.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in the C language. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is, some facilities are type-safe and their usage will not result in type errors, while other facilities in the same language may be type-unsafe and a program using them may encounter type errors. The behaviors classified as type errors by a given programming language are usually those that result from attempts to perform operations on values that are not of the appropriate data type, e.g., adding a string to an integer when there's no definition on how to handle this case. This classification is partly based on opinion.

Short-circuit evaluation, minimal evaluation, or McCarthy evaluation is the semantics of some Boolean operators in some programming languages in which the second argument is executed or evaluated only if the first argument does not suffice to determine the value of the expression: when the first argument of the AND function evaluates to false, the overall value must be false; and when the first argument of the OR function evaluates to true, the overall value must be true.

<span class="mw-page-title-main">Dangling pointer</span> Pointer that does not point to a valid object

Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references are references that do not resolve to a valid destination.

In computing, an uninitialized variable is a variable that is declared but is not set to a definite known value before it is used. It will have some value, but not a predictable one. As such, it is a programming error and a common source of bugs in software.

In some programming languages, const is a type qualifier that indicates that the data is read-only. While this can be used to declare constants, const in the C family of languages differs from similar constructs in other languages in being part of the type, and thus has complicated behavior when combined with pointers, references, composite data types, and type-checking. In other languages, the data is not in a single memory location, but copied at compile time on each use. Languages which use it include C, C++, D, JavaScript, Julia, and Rust.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

The Boehm–Demers–Weiser garbage collector, often simply known as Boehm GC, is a conservative garbage collector for C and C++ developed by Hans Boehm, Alan Demers, and Mark Weiser.

The Java Modeling Language (JML) is a specification language for Java programs, using Hoare style pre- and postconditions and invariants, that follows the design by contract paradigm. Specifications are written as Java annotation comments to the source files, which hence can be compiled with any Java compiler.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

The C and C++ programming languages are closely related but have many significant differences. C++ began as a fork of an early, pre-standardized C, and was designed to be mostly source-and-link compatible with C compilers of the time. Due to this, development tools for the two languages are often integrated into a single product, with the programmer able to specify C or C++ as their source language.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

The conditional operator is supported in many programming languages. This term usually refers to ?: as in C, C++, C#, and JavaScript. However, in Java, this term can also refer to && and ||.

<span class="mw-page-title-main">ATS (programming language)</span> Programming language

ATS is a programming language designed to unify programming with formal specification. ATS has support for combining theorem proving with practical programming through the use of advanced type systems. A past version of The Computer Language Benchmarks Game has demonstrated that the performance of ATS is comparable to that of the C and C++ programming languages. By using theorem proving and strict type checking, the compiler can detect and prove that its implemented functions are not susceptible to bugs such as division by zero, memory leaks, buffer overflow, and other forms of memory corruption by verifying pointer arithmetic and reference counting before the program compiles. Additionally, by using the integrated theorem-proving system of ATS (ATS/LF), the programmer may make use of static constructs that are intertwined with the operative code to prove that a function attains its specification.

References

  1. C. A. R. Hoare, An axiomatic basis for computer programming, Communications of the ACM , 1969.
  2. Programming With Assertions, Enabling and Disabling Assertions
  3. Jon Jagger, Compile Time Assertions in C , 1999.
  4. GNU, "GCC 4.3 Release Series — Changes, New Features, and Fixes"
  5. "Static Assertions". D Language Reference. The D Language Foundation. Retrieved 2022-03-16.
  6. Official Python Docs, assert statement
  7. "Warning Options (Using the GNU Compiler Collection (GCC))".
  8. Goldstine and von Neumann. "Planning and Coding of problems for an Electronic Computing Instrument" Archived 2018-11-12 at the Wayback Machine . Part II, Volume I, 1 April 1947, p. 12.
  9. Alan Turing. Checking a Large Routine, 1949; quoted in C. A. R. Hoare, "The Emperor's Old Clothes", 1980 Turing Award lecture.