Bounds checking

Last updated

In computer programming, bounds checking is any method of detecting whether a variable is within some bounds before it is used. It is usually used to ensure that a number fits into a given type (range checking), or that a variable being used as an array index is within the bounds of the array (index checking). A failed bounds check usually results in the generation of some sort of exception signal.

Contents

As performing bounds checking during each use can be time-consuming, it is not always done. Bounds-checking elimination is a compiler optimization technique that eliminates unneeded bounds checking.

Range checking

A range check is a check to make sure a number is within a certain range; for example, to ensure that a value about to be assigned to a 16-bit integer is within the capacity of a 16-bit integer (i.e. checking against wrap-around). This is not quite the same as type checking.[ how? ] Other range checks may be more restrictive; for example, a variable to hold the number of a calendar month may be declared to accept only the range 1 to 12.

Index checking

Index checking means that, in all expressions indexing an array, the index value is checked against the bounds of the array (which were established when the array was defined), and if the index is out-of-bounds, further execution is suspended via some sort of error. Because reading or especially writing a value outside the bounds of an array may cause the program to malfunction or crash or enable security vulnerabilities (see buffer overflow), index checking is a part of many high-level languages.

Early compiled programming languages with index checking ability included ALGOL 60, ALGOL 68 and Pascal, as well as interpreted programming languages such as BASIC.

Many programming languages, such as C, never perform automatic bounds checking to raise speed. However, this leaves many off-by-one errors and buffer overflows uncaught. Many programmers believe these languages sacrifice too much for rapid execution. [1] In his 1980 Turing Award lecture, C. A. R. Hoare described his experience in the design of ALGOL 60, a language that included bounds checking, saying:

A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interest of efficiency on production runs. Unanimously, they urged us not to—they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law.

Mainstream languages that enforce run time checking include Ada, C#, Haskell, Java, JavaScript, Lisp, PHP, Python, Ruby, Rust, and Visual Basic. The D and OCaml languages have run time bounds checking that is enabled or disabled with a compiler switch. In C++ run time checking is not part of the language, but part of the STL and is enabled with a compiler switch (_GLIBCXX_DEBUG=1 or _LIBCPP_DEBUG=1). C# also supports unsafe regions: sections of code that (among other things) temporarily suspend bounds checking to raise efficiency. These are useful for speeding up small time-critical bottlenecks without sacrificing the safety of a whole program.

The JS++ programming language is able to analyze if an array index or map key is out-of-bounds at compile time using existent types, which is a nominal type describing whether the index or key is within-bounds or out-of-bounds and guides code generation. Existent types have been shown to add only 1ms overhead to compile times. [2]

Hardware bounds checking

The safety added by bounds checking necessarily costs CPU time if the checking is performed in software; however, if the checks could be performed by hardware, then the safety can be provided "for free" with no runtime cost. An early system with hardware bounds checking was the ICL 2900 Series mainframe announced in 1974. [3] The VAX computer has an INDEX assembly instruction for array index checking which takes six operands, all of which can use any VAX addressing mode. The B6500 and similar Burroughs computers performed bound checking via hardware, irrespective of which computer language had been compiled to produce the machine code. A limited number of later CPUs have specialised instructions for checking bounds, e.g., the CHK2 instruction on the Motorola 68000 series.

Research has been underway since at least 2005 regarding methods to use x86's built-in virtual memory management unit to ensure safety of array and buffer accesses. [4] In 2015 Intel provided their Intel MPX extensions in their Skylake processor architecture which stores bounds in a CPU register and table in memory. As of early 2017 at least GCC supports MPX extensions.

See also

Related Research Articles

<span class="mw-page-title-main">Buffer overflow</span> Anomaly in computer security and programming

In programming and information security, a buffer overflow or buffer overrun is an anomaly whereby a program writes data to a buffer beyond the buffer's allocated memory, overwriting adjacent memory locations.

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power consumption.

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every term. Usually the terms are various language constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term. Type systems formalize and enforce the otherwise implicit categories the programmer uses for algebraic data types, data structures, or other components.

The Burroughs Large Systems Group produced a family of large 48-bit mainframes using stack machine instruction sets with dense syllables. The first machine in the family was the B5000 in 1961, which was optimized for compiling ALGOL 60 programs extremely well, using single-pass compilers. The B5000 evolved into the B5500 and the B5700. Subsequent major redesigns include the B6500/B6700 line and its successors, as well as the separate B8500 line.

Execution in computer and software engineering is the process by which a computer or virtual machine interpret and acts on the instructions of a computer program. Each instruction of a program is a description of a particular action which must be carried out, in order for a specific problem to be solved. Execution involves repeatedly following a 'fetch–decode–execute' cycle for each instruction done by control unit. As the executing machine follows the instructions, specific effects are produced in accordance with the semantics of those instructions.

In computer programming, a runtime library is a set of low-level routines used by a compiler to invoke some of the behaviors of a runtime environment, by inserting calls to the runtime library into compiled executable binary. The runtime environment implements the execution model, built-in functions, and other fundamental behaviors of a programming language. During execution of that computer program, execution of those calls to the runtime library cause communication between the executable binary and the runtime environment. A runtime library often includes built-in functions for memory management or exception handling. Therefore, a runtime library is always specific to the platform and compiler.

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the platform.

Buffer overflow protection is any of various techniques used during software development to enhance the security of executable programs by detecting buffer overflows on stack-allocated variables, and preventing them from causing program misbehavior or from becoming serious security vulnerabilities. A stack buffer overflow occurs when a program writes to a memory address on the program's call stack outside of the intended data structure, which is usually a fixed-length buffer. Stack buffer overflow bugs are caused when a program writes more data to a buffer located on the stack than what is actually allocated for that buffer. This almost always results in corruption of adjacent data on the stack, which could lead to program crashes, incorrect operation, or security issues.

<span class="mw-page-title-main">Integer overflow</span> Computer arithmetic error

In computer programming, an integer overflow occurs when an arithmetic operation attempts to create a numeric value that is outside of the range that can be represented with a given number of digits – either higher than the maximum or lower than the minimum representable value.

In computer science, bounds-checking elimination is a compiler optimization useful in programming languages or runtime systems that enforce bounds checking, the practice of checking every index into an array to verify that the index is within the defined valid range of indexes. Its goal is to detect which of these indexing operations do not need to be validated at runtime, and eliminating those checks.

The following outline is provided as an overview of and topical guide to computer programming:

Memory safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. For example, Java is said to be memory-safe because its runtime error detection checks array bounds and pointer dereferences. In contrast, C and C++ allow arbitrary pointer arithmetic with pointers implemented as direct memory addresses with no provision for bounds checking, and thus are potentially memory-unsafe.

In computer programming and software development, debugging is the process of finding and resolving bugs within computer programs, software, or systems.

Intel MPX are discontinued set of extensions to the x86 instruction set architecture. With compiler, runtime library and operating system support, Intel MPX claimed to enhance security to software by checking pointer references whose normal compile-time intentions are maliciously exploited at runtime due to buffer overflows. In practice, there have been too many flaws discovered in the design for it to be useful, and support has been deprecated or removed from most compilers and operating systems. Intel has listed MPX as removed in 2019 and onward hardware in section 2.5 of its Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 1.

A code sanitizer is a programming tool that detects bugs in the form of undefined or suspicious behavior by a compiler inserting instrumentation code at runtime. The class of tools was first introduced by Google's AddressSanitizer of 2012, which uses directly mapped shadow memory to detect memory corruption such as buffer overflows or accesses to a dangling pointer (use-after-free).

A high-level language computer architecture (HLLCA) is a computer architecture designed to be targeted by a specific high-level programming language (HLL), rather than the architecture being dictated by hardware considerations. It is accordingly also termed language-directed computer design, coined in McKeeman (1967) and primarily used in the 1960s and 1970s. HLLCAs were popular in the 1960s and 1970s, but largely disappeared in the 1980s. This followed the dramatic failure of the Intel 432 (1981) and the emergence of optimizing compilers and reduced instruction set computer (RISC) architectures and RISC-like complex instruction set computer (CISC) architectures, and the later development of just-in-time compilation (JIT) for HLLs. A detailed survey and critique can be found in Ditzel & Patterson (1980).

In computer science, language-based security (LBS) is a set of techniques that may be used to strengthen the security of applications on a high level by using the properties of programming languages. LBS is considered to enforce computer security on an application-level, making it possible to prevent vulnerabilities which traditional operating system security is unable to handle.

References

  1. Cowan, C; Wagle, F; Calton Pu; Beattie, S; Walpole, J (1999). "Buffer overflows: Attacks and defenses for the vulnerability of the decade". Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00. Vol. 2. pp. 119–129. doi:10.1109/DISCEX.2000.821514. ISBN   978-0-7695-0490-2. S2CID   167759976.
  2. "JS++ 0.9.0: Efficient Compile Time Analysis of Out-of-Bounds Errors – JS++ Blog". 11 January 2019. Archived from the original on 2019-01-12.
  3. J. K. Buckle (1978). The ICL 2900 Series (PDF). Macmillan Computer Science Series. pp. 17, 77. ISBN   978-0-333-21917-1. Archived from the original (PDF) on 20 April 2018. Retrieved 20 April 2018.
  4. Lap-Chung Lam; Tzi-Cker Chiueh (2005). "Checking Array Bound Violation Using Segmentation Hardware". 2005 International Conference on Dependable Systems and Networks (DSN'05). pp. 388–397. doi:10.1109/DSN.2005.25. ISBN   0-7695-2282-3. S2CID   6278708.