Unspecified behavior

Last updated

Unspecified behavior is behavior that may vary on different implementations of a programming language.[ clarification needed ] A program can be said to contain unspecified behavior when its source code may produce an executable that exhibits different behavior when compiled on a different compiler, or on the same compiler with different settings, or indeed in different parts of the same executable. While the respective language standards or specifications may impose a range of possible behaviors, the exact behavior depends on the implementation and may not be completely determined upon examination of the program's source code. [1] Unspecified behavior will often not manifest itself in the resulting program's external behavior, but it may sometimes lead to differing outputs or results, potentially causing portability problems.

Contents

Definition

To enable compilers to produce optimal code for their respective target platforms, programming language standards do not always impose a certain specific behavior for a given source code construct. [2] Failing to explicitly define the exact behavior of every possible program is not considered an error or weakness in the language specification, and doing so would be infeasible. [1] In the C and C++ languages, such non-portable constructs are generally grouped into three categories: Implementation-defined, unspecified, and undefined behavior. [3]

The exact definition of unspecified behavior varies. In C++, it is defined as "behavior, for a well-formed program construct and correct data, that depends on the implementation." [4] The C++ Standard also notes that the range of possible behaviors is usually provided. [4] Unlike implementation-defined behavior, there is no requirement for the implementation to document its behavior. [4] Similarly, the C Standard defines it as behavior for which the standard "provides two or more possibilities and imposes no further requirements on which is chosen in any instance". [5] Unspecified behavior is different from undefined behavior. The latter is typically a result of an erroneous program construct or data, and no requirements are placed on the translation or execution of such constructs. [6]

Implementation-defined behavior

C and C++ distinguish implementation-defined behavior from unspecified behavior. For implementation-defined behavior, the implementation must choose a particular behavior and document it. An example in C/C++ is the size of integer data types. The choice of behavior must be consistent with the documented behavior within a given execution of the program.

Examples

Order of evaluation of subexpressions

Many programming languages do not specify the order of evaluation of the sub-expressions of a complete expression. This non-determinism can allow optimal implementations for specific platforms e.g. to utilise parallelism. If one or more of the sub-expressions has side effects, then the result of evaluating the full-expression may be different depending on the order of evaluation of the sub-expressions. [1] For example, given

a=f(b)+g(b);

, where f and g both modify b, the result stored in a may be different depending on whether f(b) or g(b) is evaluated first. [1] In the C and C++ languages, this also applies to function arguments. Example: [2]

#include<iostream>intf(){std::cout<<"In f\n";return3;}intg(){std::cout<<"In g\n";return4;}intsum(inti,intj){returni+j;}intmain(){returnsum(f(),g());}

The resulting program will write its two lines of output in an unspecified order. [2] In other languages, such as Java, the order of evaluation of operands and function arguments is explicitly defined. [7]

See also

Related Research Articles

ANSI C, ISO C, and Standard C are successive standards for the C programming language published by the American National Standards Institute (ANSI) and ISO/IEC JTC 1/SC 22/WG 14 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Historically, the names referred specifically to the original and best-supported version of the standard. Software developers writing in C are encouraged to conform to the standards, as doing so helps portability between compilers.

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, protocol stacks, though decreasingly for application software. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

The Common Language Infrastructure (CLI) is an open specification and technical standard originally developed by Microsoft and standardized by ISO/IEC and Ecma International that describes executable code and a runtime environment that allows multiple high-level languages to be used on different computer platforms without being rewritten for specific architectures. This implies it is platform agnostic. The .NET Framework, .NET and Mono are implementations of the CLI. The metadata format is also used to specify the API definitions exposed by the Windows Runtime.

<span class="mw-page-title-main">C++</span> General-purpose programming language

C++ is a high-level, general-purpose programming language created by Danish computer scientist Bjarne Stroustrup. First released in 1985 as an extension of the C programming language, it has since expanded significantly over time; modern C++ currently has object-oriented, generic, and functional features, in addition to facilities for low-level memory manipulation. It is almost always implemented as a compiled language, and many vendors provide C++ compilers, including the Free Software Foundation, LLVM, Microsoft, Intel, Embarcadero, Oracle, and IBM.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the platform.

<span class="mw-page-title-main">C99</span> C programming language standard, 1999 revision

C99 is an informal name for ISO/IEC 9899:1999, a past version of the C programming language standard. It extends the previous version (C90) with new features for the language and the standard library, and helps implementations make better use of available computer hardware, such as IEEE 754-1985 floating-point arithmetic, and compiler technology. The C11 version of the C programming language standard, published in 2011, replaces C99.

Short-circuit evaluation, minimal evaluation, or McCarthy evaluation is the semantics of some Boolean operators in some programming languages in which the second argument is executed or evaluated only if the first argument does not suffice to determine the value of the expression: when the first argument of the AND function evaluates to false, the overall value must be false; and when the first argument of the OR function evaluates to true, the overall value must be true.

In C and C++, sequence point defines any point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed. They are a core concept for determining the validity of and, if valid, the possible results of expressions. Adding more sequence points is sometimes necessary to make an expression defined and to ensure a single valid order of evaluation.

In computing, the modulo operation returns the remainder or signed remainder of a division, after one number is divided by another.

The One Definition Rule (ODR) is an important rule of the C++ programming language that prescribes that classes/structs and non-inline functions cannot have more than one definition in the entire program and template and types cannot have more than one definition by translation unit. It is defined in the ISO C++ Standard 2003, at section 3.2.

<span class="mw-page-title-main">C data types</span> Data types supported by the C programming language

In the C programming language, data types constitute the semantics and characteristics of storage of data elements. They are expressed in the language syntax in form of declarations for memory locations or variables. Data types also determine the types of operations or methods of processing of data elements.

C++ Technical Report 1 (TR1) is the common name for ISO/IEC TR 19768, C++ Library Extensions, which is a document that proposed additions to the C++ standard library for the C++03 language standard. The additions include regular expressions, smart pointers, hash tables, and random number generators. TR1 was not a standard itself, but rather a draft document. However, most of its proposals became part of the later official standard, C++11. Before C++11 was standardized, vendors used this document as a guide to create extensions. The report's goal was "to build more widespread existing practice for an expanded C++ standard library".

A class in C++ is a user-defined type or data structure declared with keyword class that has data and functions as its members whose access is governed by the three access specifiers private, protected or public. By default access to members of a C++ class is private. The private members are not accessible outside the class; they can be accessed only through methods of the class. The public members form an interface to the class and are accessible outside the class.

assert.h is a header file in the C standard library. It defines the C preprocessor macro assert and implements runtime assertion in C.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

In computer science, a type punning is any programming technique that subverts or circumvents the type system of a programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal language.

In C++ computer programming, allocators are a component of the C++ Standard Library. The standard library provides several data structures, such as list and set, commonly referred to as containers. A common trait among these containers is their ability to change size during the execution of the program. To achieve this, some form of dynamic memory allocation is usually required. Allocators handle all the requests for allocation and deallocation of memory for a given container. The C++ Standard Library provides general-purpose allocators that are used by default, however, custom allocators may also be supplied by the programmer.

In C++ computer programming, copy elision refers to a compiler optimization technique that eliminates unnecessary copying of objects.

In computing, sequence containers refer to a group of container class templates in the standard library of the C++ programming language that implement storage of data elements. Being templates, they can be used to store arbitrary elements, such as integers or custom classes. One common property of all sequential containers is that the elements can be accessed sequentially. Like all other standard library components, they reside in namespace std.

References

  1. 1 2 3 4 ISO/IEC (2009-05-29). ISO/IEC PDTR 24772.2: Guidance to Avoiding Vulnerabilities in Programming Languages through Language Selection and Use
  2. 1 2 3 Becker, Pete (2006-05-16). "Living By the Rules". Dr. Dobb's Journal . Retrieved 26 November 2009.
  3. Henricson, Mats; Nyquist, Erik (1997). Industrial Strength C++ . Prentice Hall. ISBN   0-13-120965-5.
  4. 1 2 3 ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §1.3.13 unspecified behavior [defns.unspecified]
  5. ISO/IEC (1999). ISO/IEC 9899:1999(E): Programming Languages - C §3.4.4 para 1
  6. ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §1.3.12 undefined behavior [defns.undefined]
  7. James Gosling, Bill Joy, Guy Steele, and Gilad Bracha (2005). The Java Language Specification, Third Edition. Addison-Wesley. ISBN   0-321-24678-0