Code sanitizer

Last updated

A code sanitizer is a programming tool that detects bugs in the form of undefined or suspicious behavior by a compiler inserting instrumentation code at runtime. The class of tools was first introduced by Google's AddressSanitizer (or ASan) of 2012, which uses directly mapped shadow memory to detect memory corruption such as buffer overflows or accesses to a dangling pointer (use-after-free).

Contents

AddressSanitizer

Google's ASan, introduced in 2012, uses a shadow memory scheme to detect memory bugs. It is available in:

On average, the instrumentation increases processing time by about 73% and memory usage by 240%. [5] There is a hardware-accelerated ASan called HWAsan available for AArch64 and (in a limited fashion) x86_64. [6]

AddressSanitizer does not detect any uninitialized memory reads (but this is detected by MemorySanitizer [7] ), and only detects some use-after-return bugs. [8] It is also not capable of detecting all arbitrary memory corruption bugs, nor all arbitrary write bugs due to integer underflow/overflows (when the integer with undefined behavior is used to calculate memory address offsets). Adjacent buffers in structs and classes are not protected from overflow, in part to prevent breaking backwards compatibility. [9]

KernelAddressSanitizer

The KernelAddressSanitizer (KASan) detects dynamic memory errors in the Linux kernel. [10] Kernel instrumentation requires a special feature in the compiler supplying the -fsanitize=kernel-address command line option, since kernels do not use the same address space as normal programs. [11] [12]

KASan is also available for use with Windows kernel drivers beginning in Windows 11 22H2 and above. [13] . Similarly to Linux, compiling a Windows driver with KASAN requires passing the /fsanitize=kernel-address command line option to the MSVC compiler.

Other sanitizers

Google also produced LeakSanitizer (LSan, memory leaks), ThreadSanitizer (TSan, data races and deadlocks), MemorySanitizer (MSan, uninitialized memory), and UndefinedBehaviorSanitizer (UBSan, undefined behaviors, with fine-grained control). [14] These tools are generally available in Clang/LLVM and GCC. [15] [16] [17] Similar to KASan, there are kernel-specific versions of LSan, MSan, TSan, as well as completely original kernel sanitizers such as KFENCE and KCSan. [18]

Additional sanitizer tools (grouped by compilers under -fsanitize or a similar flag) include: [15] [16] [17]

Usage

A code sanitizer detects suspicious behavior as the program runs. One common way to use a sanitizer is to combine it with fuzzing, which generates inputs likely to trigger bugs. [21]

Users

Chromium and Firefox developers are active users of AddressSanitizer; [21] [22] the tool has found hundreds of bugs in these web browsers. [23] A number of bugs were found in FFmpeg [24] and FreeType. [25] The Linux kernel has enabled the AddressSanitizer for the x86-64 architecture as of Linux version 4.0.

Examples

ASan: Heap-use-after-free

// To compile: g++ -O -g -fsanitize=address heap-use-after-free.ccintmain(intargc,char**argv){int*array=newint[100];delete[]array;returnarray[argc];// BOOM}
$ ./a.out ==5587==ERROR: AddressSanitizer: heap-use-after-free on address 0x61400000fe44 at pc 0x47b55f bp 0x7ffc36b28200 sp 0x7ffc36b281f8 READ of size 4 at 0x61400000fe44 thread T0     #0 0x47b55e in main /home/test/example_UseAfterFree.cc:5     #1 0x7f15cfe71b14 in __libc_start_main (/lib64/libc.so.6+0x21b14)     #2 0x47b44c in _start (/root/a.out+0x47b44c)  0x61400000fe44 is located 4 bytes inside of 400-byte region [0x61400000fe40,0x61400000ffd0) freed by thread T0 here:     #0 0x465da9 in operator delete[](void*) (/root/a.out+0x465da9)     #1 0x47b529 in main /home/test/example_UseAfterFree.cc:4  previously allocated by thread T0 here:     #0 0x465aa9 in operator new[](unsigned long) (/root/a.out+0x465aa9)     #1 0x47b51e in main /home/test/example_UseAfterFree.cc:3  SUMMARY: AddressSanitizer: heap-use-after-free /home/test/example_UseAfterFree.cc:5 main Shadow bytes around the buggy address:   [...]   0x0c287fff9fb0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa =>0x0c287fff9fc0: fa fa fa fa fa fa fa fa[fd]fd fd fd fd fd fd fd   0x0c287fff9fd0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd   [...] Shadow byte legend (one shadow byte represents 8 application bytes):   Addressable:           00   Partially addressable: 01 02 03 04 05 06 07    Heap left redzone:     fa   Heap right redzone:    fb   Freed heap region:     fd   Stack left redzone:    f1   Stack mid redzone:     f2   Stack right redzone:   f3   Stack partial redzone: f4   Stack after return:    f5   Stack use after scope: f8   Global redzone:        f9   Global init order:     f6   Poisoned by user:      f7   ASan internal:         fe ==5587==ABORTING 

ASan: Heap-buffer-overflow

// RUN: clang++ -O -g -fsanitize=address heap-buf-of.cc && ./a.outintmain(intargc,char**argv){int*array=newint[100];array[0]=0;intres=array[argc+100];// BOOMdelete[]array;returnres;}
==25372==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61400000ffd4 at pc 0x0000004ddb59 bp 0x7fffea6005a0 sp 0x7fffea600598 READ of size 4 at 0x61400000ffd4 thread T0     #0 0x46bfee in main /tmp/main.cpp:4:13  0x61400000ffd4 is located 4 bytes to the right of 400-byte region [0x61400000fe40,0x61400000ffd0) allocated by thread T0 here:     #0 0x4536e1 in operator delete[](void*)     #1 0x46bfb9 in main /tmp/main.cpp:2:16 

ASan: Stack-buffer-overflow

// RUN: clang -O -g -fsanitize=address stack-buf-of.cc && ./a.outintmain(intargc,char**argv){intstack_array[100];stack_array[1]=0;returnstack_array[argc+100];// BOOM}
==7405==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff64740634 at pc 0x46c103 bp 0x7fff64740470 sp 0x7fff64740468 READ of size 4 at 0x7fff64740634 thread T0     #0 0x46c102 in main /tmp/example_StackOutOfBounds.cc:5  Address 0x7fff64740634 is located in stack of thread T0 at offset 436 in frame     #0 0x46bfaf in main /tmp/example_StackOutOfBounds.cc:2    This frame has 1 object(s):     [32, 432) 'stack_array' <== Memory access at offset 436 overflows this variable 

ASan: Global-buffer-overflow

// RUN: clang -O -g -fsanitize=address global-buf-of.cc && ./a.outintglobal_array[100]={-1};intmain(intargc,char**argv){returnglobal_array[argc+100];// BOOM}
==7455==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000000689b54 at pc 0x46bfd8 bp 0x7fff515e5ba0 sp 0x7fff515e5b98 READ of size 4 at 0x000000689b54 thread T0     #0 0x46bfd7 in main /tmp/example_GlobalOutOfBounds.cc:4  0x000000689b54 is located 4 bytes to the right of    global variable 'global_array' from 'example_GlobalOutOfBounds.cc' (0x6899c0) of size 400 

UBSan: nullptr-dereference

// RUN: g++ -O -g -fsanitize=null null-dereference.c && ./a.outintmain(intargc,char**argv){constchar*ptr=nullptr;return*ptr;// BOOM}
null-dereference.c:4:10: runtime error: load of null pointer of type 'const char' Segmentation fault (core dumped) 

See also

Related Research Articles

<span class="mw-page-title-main">GNU Debugger</span> Source-level debugger

The GNU Debugger (GDB) is a portable debugger that runs on many Unix-like systems and works for many programming languages, including Ada, Assembly, C, C++, D, Fortran, Haskell, Go, Objective-C, OpenCL C, Modula-2, Pascal, Rust, and partially others.

In computing, a bus error is a fault raised by hardware, notifying an operating system (OS) that a process is trying to access memory that the CPU cannot physically address: an invalid address for the address bus, hence the name. In modern use on most architectures these are much rarer than segmentation faults, which occur primarily due to memory access violations: problems in the logical address or permissions.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

<span class="mw-page-title-main">Valgrind</span> Programming tool for profiling, memory debugging and memory leak detection

Valgrind is a programming tool for memory debugging, memory leak detection, and profiling.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

Buffer overflow protection is any of various techniques used during software development to enhance the security of executable programs by detecting buffer overflows on stack-allocated variables, and preventing them from causing program misbehavior or from becoming serious security vulnerabilities. A stack buffer overflow occurs when a program writes to a memory address on the program's call stack outside of the intended data structure, which is usually a fixed-length buffer. Stack buffer overflow bugs are caused when a program writes more data to a buffer located on the stack than what is actually allocated for that buffer. This almost always results in corruption of adjacent data on the stack, which could lead to program crashes, incorrect operation, or security issues.

In software, a stack overflow occurs if the call stack pointer exceeds the stack bound. The call stack may consist of a limited amount of address space, often determined at the start of the program. The size of the call stack depends on many factors, including the programming language, machine architecture, multi-threading, and amount of available memory. When a program attempts to use more space than is available on the call stack, the stack is said to overflow, typically resulting in a program crash.

In computer programming, an entry point is the place in a program where the execution of a program begins, and where the program has access to command line arguments.

<span class="mw-page-title-main">Stack-based memory allocation</span> Form of computer memory allocation

Stacks in computing architectures are regions of memory where data is added or removed in a last-in-first-out (LIFO) manner.

gtkmm is the official C++ interface for the popular GUI library GTK. gtkmm is free software distributed under the GNU Lesser General Public License (LGPL).

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

In computing, exec is a functionality of an operating system that runs an executable file in the context of an already existing process, replacing the previous executable. This act is also referred to as an overlay. It is especially important in Unix-like systems, although it also exists elsewhere. As no new process is created, the process identifier (PID) does not change, but the machine code, data, heap, and stack of the process are replaced by those of the new program.

Dynamic program analysis is the act of analyzing software that involves executing a program – as opposed to static program analysis, which does not execute it.

In software, a stack buffer overflow or stack buffer overrun occurs when a program writes to a memory address on the program's call stack outside of the intended data structure, which is usually a fixed-length buffer. Stack buffer overflow bugs are caused when a program writes more data to a buffer located on the stack than what is actually allocated for that buffer. This almost always results in corruption of adjacent data on the stack, and in cases where the overflow was triggered by mistake, will often cause the program to crash or operate incorrectly. Stack buffer overflow is a type of the more general programming malfunction known as buffer overflow. Overfilling a buffer on the stack is more likely to derail program execution than overfilling a buffer on the heap because the stack contains the return addresses for all active function calls.

Memory safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. For example, Java is said to be memory-safe because its runtime error detection checks array bounds and pointer dereferences. In contrast, C and C++ allow arbitrary pointer arithmetic with pointers implemented as direct memory addresses with no provision for bounds checking, and thus are potentially memory-unsafe.

<span class="mw-page-title-main">ATS (programming language)</span> Programming language

In computing, ATS is a multi-paradigm, general-purpose, high-level, functional programming language. It is a dialect of the programming language ML, designed by Hongwei Xi to unify computer programming with formal specification. ATS has support for combining theorem proving with practical programming through the use of advanced type systems. A past version of The Computer Language Benchmarks Game has demonstrated that the performance of ATS is comparable to that of the languages C and C++. By using theorem proving and strict type checking, the compiler can detect and prove that its implemented functions are not susceptible to bugs such as division by zero, memory leaks, buffer overflow, and other forms of memory corruption by verifying pointer arithmetic and reference counting before the program compiles. Also, by using the integrated theorem-proving system of ATS (ATS/LF), the programmer may make use of static constructs that are intertwined with the operative code to prove that a function conforms to its specification.

Blocks are a non-standard extension added by Apple Inc. to Clang's implementations of the C, C++, and Objective-C programming languages that uses a lambda expression-like syntax to create closures within these languages. Blocks are supported for programs developed for Mac OS X 10.6+ and iOS 4.0+, although third-party runtimes allow use on Mac OS X 10.5 and iOS 2.2+ and non-Apple systems.

Control-flow integrity (CFI) is a general term for computer security techniques that prevent a wide variety of malware attacks from redirecting the flow of execution of a program.

<span class="mw-page-title-main">RaftLib</span>

RaftLib is a portable parallel processing system that aims to provide extreme performance while increasing programmer productivity. It enables a programmer to assemble a massively parallel program using simple iostream-like operators. RaftLib handles threading, memory allocation, memory placement, and auto-parallelization of compute kernels. It enables applications to be constructed from chains of compute kernels forming a task and pipeline parallel compute graph. Programs are authored in C++.

References

  1. "LLVM 3.1 Release Notes". LLVM. Retrieved 8 February 2014.
  2. "GCC 4.8 Release Notes". GCC. Retrieved 8 February 2014.
  3. "Address Sanitizer | Apple Developer Documentation".
  4. "Visual Studio 2019 version 16.9 Release Notes". Microsoft. Retrieved 5 March 2021.
  5. Konstantin Serebryany; Derek Bruening; Alexander Potapenko; Dmitry Vyukov. "AddressSanitizer: a fast address sanity checker" (PDF). Proceedings of the 2012 USENIX conference on Annual Technical Conference.
  6. "Hardware-assisted AddressSanitizer Design Documentation — Clang 17.0.0git documentation". clang.llvm.org.
  7. "MemorySanitizer". GitHub .
  8. "ComparisonOfMemoryTools". AddressSanitizer Wiki. Retrieved 1 December 2017.
  9. "Bypassing AddressSanitizer" (PDF). Eric Wimberley. Retrieved 1 July 2014.
  10. "KernelAddressSanitizer (KASAN)". Archived from the original on 2015-09-15.
  11. Jake Edge. "The kernel address sanitizer".
  12. Jonathan Corbet. "3.20 merge window part 2".
  13. "Kernel Address Sanitizer (KASAN)". Archived from the original on 2024-11-04.
  14. Google (2 March 2023). "sanitizers: This project is the home for Sanitizers: AddressSanitizer, MemorySanitizer, ThreadSanitizer, LeakSanitizer, and more". GitHub. Google.{{cite web}}: |last1= has generic name (help)
  15. 1 2 "sanitizer - The Rust Unstable Book". doc.rust-lang.org. This feature allows for use of one of following sanitizers: [...] ControlFlowIntegrity LLVM Control Flow Integrity
  16. 1 2 "Clang Compiler User's Manual — Clang 17.0.0git documentation". clang.llvm.org. -f[no-]sanitize=check1,check2,... Turn on runtime checks for various forms of undefined or suspicious behavior
  17. 1 2 "Instrumentation Options (Using the GNU Compiler Collection (GCC))". gcc.gnu.org.
  18. "Linux Kernel Sanitizers". Google. 2 March 2023.
  19. "GWP-ASan — LLVM 17.0.0git documentation". llvm.org.
  20. "libFuzzer – a library for coverage-guided fuzz testing. — LLVM 17.0.0git documentation". llvm.org.
  21. 1 2 Abhishek Arya; Cris Neckar; Chrome Security Team. "Fuzzing for Security".
  22. "Securing Firefox: Trying new code analysis techniques". Archived from the original on 2016-03-07. Retrieved 2018-06-18.
  23. "Some of the bugs found by AddressSanitizer". GitHub .
  24. Mateusz Jurczyk; Gynvael Coldwind (2014-01-10). "FFmpeg and a thousand fixes". J00Ru//Vx Tech Blog.{{cite news}}: External link in |newspaper= (help)
  25. "Search results for AddressSanitizer in FreeType Bugs".