Uncontrolled format string is a type of code injection vulnerability discovered around 1989 that can be used in security exploits. [1] Originally thought harmless, format string exploits can be used to crash a program or to execute harmful code. The problem stems from the use of unchecked user input as the format string parameter in certain C functions that perform formatting, such as printf()
. A malicious user may use the %s
and %x
format tokens, among others, to print data from the call stack or possibly other locations in memory. One may also write arbitrary data to arbitrary locations using the %n
format token, which commands printf()
and similar functions to write the number of bytes formatted to an address stored on the stack.
A typical exploit uses a combination of these techniques to take control of the instruction pointer (IP) of a process, [2] for example by forcing a program to overwrite the address of a library function or the return address on the stack with a pointer to some malicious shellcode. The padding parameters to format specifiers are used to control the number of bytes output and the %x
token is used to pop bytes from the stack until the beginning of the format string itself is reached. The start of the format string is crafted to contain the address that the %n
format token can then overwrite with the address of the malicious code to execute.
This is a common vulnerability because format bugs were previously thought harmless and resulted in vulnerabilities in many common tools. MITRE's CVE project lists roughly 500 vulnerable programs as of June 2007, and a trend analysis ranks it the 9th most-reported vulnerability type between 2001 and 2006. [3]
Format string bugs most commonly appear when a programmer wishes to output a string containing user supplied data (either to a file, to a buffer, or to the user). The programmer may mistakenly write printf(buffer)
instead of printf("%s", buffer)
. The first version interprets buffer
as a format string, and parses any formatting instructions it may contain. The second version simply prints a string to the screen, as the programmer intended. Both versions behave identically in the absence of format specifiers in the string, which makes it easy for the mistake to go unnoticed by the developer.
Format bugs arise because C's argument passing conventions are not type-safe. In particular, the varargs
mechanism allows functions to accept any number of arguments (e.g. printf
) by "popping" as many arguments off the call stack as they wish, trusting the early arguments to indicate how many additional arguments are to be popped, and of what types.
Format string bugs can occur in other programming languages besides C, such as Perl, although they appear with less frequency and usually cannot be exploited to execute code of the attacker's choice. [4]
Format bugs were first noted in 1989 by the fuzz testing work done at the University of Wisconsin, which discovered an "interaction effect" in the C shell (csh) between its command history mechanism and an error routine that assumed safe string input. [5]
The use of format string bugs as an attack vector was discovered in September 1999 by Tymm Twillman during a security audit of the ProFTPD daemon. [6] The audit uncovered an snprintf
that directly passed user-generated data without a format string. Extensive tests with contrived arguments to printf-style functions showed that use of this for privilege escalation was possible. This led to the first posting in September 1999 on the Bugtraq mailing list regarding this class of vulnerabilities, including a basic exploit. [6] It was still several months, however, before the security community became aware of the full dangers of format string vulnerabilities as exploits for other software using this method began to surface. The first exploits that brought the issue to common awareness (by providing remote root access via code execution) were published simultaneously on the Bugtraq list in June 2000 by Przemysław Frasunek [7] and a person using the nickname tf8. [8] They were shortly followed by an explanation, posted by a person using the nickname lamagra. [9] "Format bugs" was posted to the Bugtraq list by Pascal Bouchareine in July 2000. [10] The seminal paper "Format String Attacks" [11] by Tim Newsham was published in September 2000 and other detailed technical explanation papers were published in September 2001 such as Exploiting Format String Vulnerabilities, by team Teso. [2]
Many compilers can statically check format strings and produce warnings for dangerous or suspect formats. In the GNU Compiler Collection, the relevant compiler flags are, -Wall
,-Wformat
, -Wno-format-extra-args
, -Wformat-security
, -Wformat-nonliteral
, and -Wformat=2
. [12]
Most of these are only useful for detecting bad format strings that are known at compile-time. If the format string may come from the user or from a source external to the application, the application must validate the format string before using it. Care must also be taken if the application generates or selects format strings on the fly. If the GNU C library is used, the -D_FORTIFY_SOURCE=2
parameter can be used to detect certain types of attacks occurring at run-time. The -Wformat-nonliteral
check is more stringent.
Contrary to many other security issues, the root cause of format string vulnerabilities is relatively easy to detect in x86-compiled executables: For printf
-family functions, proper use implies a separate argument for the format string and the arguments to be formatted. Faulty uses of such functions can be spotted by simply counting the number of arguments passed to the function; an "argument deficiency" [2] is then a strong indicator that the function was misused.
Counting the number of arguments is often made easy on x86 due to a calling convention where the caller removes the arguments that were pushed onto the stack by adding to the stack pointer after the call, so a simple examination of the stack correction yields the number of arguments passed to the printf
-family function.' [2]
printf
scanf
In programming and information security, a buffer overflow or buffer overrun is an anomaly whereby a program writes data to a buffer beyond the buffer's allocated memory, overwriting adjacent memory locations.
Defensive programming is a form of defensive design intended to develop programs that are capable of detecting potential security abnormalities and make predetermined responses. It ensures the continuing function of a piece of software under unforeseen circumstances. Defensive programming practices are often used where high availability, safety, or security is needed.
In computing, a polyglot is a computer program or script written in a valid form of multiple programming languages or file formats. The name was coined by analogy to multilingualism. A polyglot file is composed by combining syntax from two or more different formats.
The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard. Starting from the original ANSI C standard, it was developed at the same time as the C library POSIX specification, which is a superset of it. Since ANSI C was adopted by the International Organization for Standardization, the C standard library is also called the ISO C library.
The printf family of functions in the C programming language are a set of functions that take a format string as input among a variable sized list of other values and produce as output a string that corresponds to the format specifier and given input values. The string is written in a simple template language: characters are usually copied literally into the function's output, but format specifiers, which start with a %
character, indicate the location and method to translate a piece of data to characters. The design has been copied to expose similar functionality in other programming languages.
Buffer overflow protection is any of various techniques used during software development to enhance the security of executable programs by detecting buffer overflows on stack-allocated variables, and preventing them from causing program misbehavior or from becoming serious security vulnerabilities. A stack buffer overflow occurs when a program writes to a memory address on the program's call stack outside of the intended data structure, which is usually a fixed-length buffer. Stack buffer overflow bugs are caused when a program writes more data to a buffer located on the stack than what is actually allocated for that buffer. This almost always results in corruption of adjacent data on the stack, which could lead to program crashes, incorrect operation, or security issues.
A "return-to-libc" attack is a computer security attack usually starting with a buffer overflow in which a subroutine return address on a call stack is replaced by an address of a subroutine that is already present in the process executable memory, bypassing the no-execute bit feature and ridding the attacker of the need to inject their own code. The first example of this attack in the wild was contributed by Alexander Peslyak on the Bugtraq mailing list in 1997.
Code injection is the exploitation of a computer bug that is caused by processing invalid data. The injection is used by an attacker to introduce code into a vulnerable computer program and change the course of execution. The result of successful code injection can be disastrous, for example, by allowing computer viruses or computer worms to propagate.
In computer security, arbitrary code execution (ACE) is an attacker's ability to run any commands or code of the attacker's choice on a target machine or in a target process. An arbitrary code execution vulnerability is a security flaw in software or hardware allowing arbitrary code execution. A program that is designed to exploit such a vulnerability is called an arbitrary code execution exploit. The ability to trigger arbitrary code execution over a network is often referred to as remote code execution (RCE).
A directory traversal attack exploits insufficient security validation or sanitization of user-supplied file names, such that characters representing "traverse to parent directory" are passed through to the operating system's file system API. An affected application can be exploited to gain unauthorized access to the file system.
The OpenBSD operating system focuses on security and the development of security features. According to author Michael W. Lucas, OpenBSD "is widely regarded as the most secure operating system available anywhere, under any licensing terms."
A scanf format string is a control parameter used in various functions to specify the layout of an input string. The functions can then divide the string and translate into values of appropriate data types. String scanning functions are often supplied in standard libraries. Scanf is a function that reads formatted data from the standard input string, which is usually the keyboard and writes the results whenever called in the specified arguments.
In software, a stack buffer overflow or stack buffer overrun occurs when a program writes to a memory address on the program's call stack outside of the intended data structure, which is usually a fixed-length buffer. Stack buffer overflow bugs are caused when a program writes more data to a buffer located on the stack than what is actually allocated for that buffer. This almost always results in corruption of adjacent data on the stack, and in cases where the overflow was triggered by mistake, will often cause the program to crash or operate incorrectly. Stack buffer overflow is a type of the more general programming malfunction known as buffer overflow. Overfilling a buffer on the stack is more likely to derail program execution than overfilling a buffer on the heap because the stack contains the return addresses for all active function calls.
Memory safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. For example, Java is said to be memory-safe because its runtime error detection checks array bounds and pointer dereferences. In contrast, C and C++ allow arbitrary pointer arithmetic with pointers implemented as direct memory addresses with no provision for bounds checking, and thus are potentially memory-unsafe.
Przemysław Frasunek is a "white hat" hacker from Poland. He has been a frequent Bugtraq poster since late in the 1990s, noted for one of the first published successful software exploits for the format string bug class of attacks, just after the first exploit of the person using nickname tf8. Until that time the vulnerability was thought harmless. He serves as the CEO of Redge Technologies.
Secure coding is the practice of developing computer software in such a way that guards against the accidental introduction of security vulnerabilities. Defects, bugs and logic flaws are consistently the primary cause of commonly exploited software vulnerabilities. Through the analysis of thousands of reported vulnerabilities, security professionals have discovered that most vulnerabilities stem from a relatively small number of common software programming errors. By identifying the insecure coding practices that lead to these errors and educating developers on secure alternatives, organizations can take proactive steps to help significantly reduce or eliminate vulnerabilities in software before deployment.
Return-oriented programming (ROP) is a computer security exploit technique that allows an attacker to execute code in the presence of security defenses such as executable space protection and code signing.
Shellshock, also known as Bashdoor, is a family of security bugs in the Unix Bash shell, the first of which was disclosed on 24 September 2014. Shellshock could enable an attacker to cause Bash to execute arbitrary commands and gain unauthorized access to many Internet-facing services, such as web servers, that use Bash to process requests.
American Fuzzy Lop (AFL), stylized in all lowercase as american fuzzy lop, is a free software fuzzer that employs genetic algorithms in order to efficiently increase code coverage of the test cases. So far it has detected dozens of significant software bugs in major free software projects, including X.Org Server, PHP, OpenSSL, pngcrush, bash, Firefox, BIND, Qt, and SQLite.
Trojan Source is the name of a software vulnerability that abuses Unicode's bidirectional characters to display source code differently than the actual execution of the source code. The exploit utilizes how writing scripts of different reading directions are displayed and encoded on computers. It was discovered by Nicholas Boucher and Ross Anderson at Cambridge University in late 2021.