American Fuzzy Lop (software)

Last updated

American Fuzzy Lop
Developer(s) Michał Zalewski
Initial releaseNovember 12, 2013;10 years ago (2013-11-12)
Stable release
2.57b / June 30, 2020;3 years ago (2020-06-30) [1]
Repository
Written in C, assembly
Operating system Cross-platform
Type Fuzzer
License Apache License  2.0
Website lcamtuf.coredump.cx/afl/   OOjs UI icon edit-ltr-progressive.svg

American Fuzzy Lop (AFL), stylized in all lowercase as american fuzzy lop, is a free software fuzzer that employs genetic algorithms in order to efficiently increase code coverage of the test cases. So far it has detected dozens of significant software bugs in major free software projects, including X.Org Server, [2] PHP, [3] OpenSSL, [4] [5] pngcrush, bash, [6] Firefox, [7] BIND, [8] [9] Qt, [10] and SQLite. [11]

Contents

For many years after its release, AFL has been considered a "state of the art" fuzzer. [12] AFL is considered "a de-facto standard for fuzzing", [13] and the release of AFL contributed significantly to the development of fuzzing as a research area. [14] AFL is widely used in academia; academic fuzzers are often forks of AFL, and AFL is commonly used as a baseline to evaluate new techniques. [15] [16]

The source code of American fuzzy lop is published on GitHub. Its name is a reference to a breed of rabbit, the American Fuzzy Lop.

Overview

AFL requires the user to provide a sample command that runs the tested application and at least one small example input. The input can be fed to the tested program either via standard input or as an input file specified in the process command line. Fuzzing networked programs is currently not directly supported, although in some cases there are feasible solutions to this problem. [17] For example, in case of an audio player, American fuzzy lop can be instructed to open a short sound file with it. Then, the fuzzer attempts to actually execute the specified command and if that succeeds, it tries to reduce the input file to the smallest one that triggers the same behavior.

After this initial phase, AFL begins the actual process of fuzzing by applying various modifications to the input file. When the tested program crashes or hangs, this usually implies the discovery of a new bug, possibly a security vulnerability. In this case, the modified input file is saved for further user inspection.

In order to maximize the fuzzing performance, American fuzzy lop expects the tested program to be compiled with the aid of a utility program that instruments the code with helper functions which track control flow. This allows the fuzzer to detect when the target's behavior changes in response to the input. In cases when this is not possible, black-box testing is supported as well.

Fuzzing algorithm

AFL's logo from fuzzed input stitched together as a single animation. AFL Fuzz Logo.gif
AFL's logo from fuzzed input stitched together as a single animation.

Fuzzers attempt to find unexpected behaviors (i.e., bugs) in a target program by repeatedly executing the program on various inputs. As described above, AFL is a gray-box fuzzer, meaning it injects instrumentation to measure code coverage into the target program at compile time and uses the coverage metric to direct the generation of new inputs. AFL's fuzzing algorithm has influenced many subsequent gray-box fuzzers. [19] [20]

The inputs to AFL are an instrumented target program (the system under test) and corpus, that is, a collection of inputs to the target. Inputs are also known as test cases. The algorithm maintains a queue of inputs, which is initialized to the input corpus. The overall algorithm works as follows: [21]

  1. Load the next input from the queue
  2. Minimize the test case
  3. Mutate the test case. If any mutant results in additional code coverage, add it to the queue. If the mutant results in a crash or hang, save it to disk for later inspection.
  4. Go to step 1

Mutation

To generate new inputs, AFL applies various mutations to existing inputs. [22] These mutations are mostly agnostic to the input format of the target program; they generally treat the input as simple blob of binary data.

At first, AFL applies a deterministic sequence of mutations to each input. These are applied at various offsets in the input. They include: [23] [24]

After applying all available deterministic mutations, AFL moves on to havoc, a stage where between 2 and 128 mutations are applied in a row. These mutations are any of: [22]

If AFL cycles through the entire queue without generating any input that achieves new code coverage, it begins splicing. Splicing takes two inputs from the queue, truncates them at arbitrary positions, concatenates them together, and applies the havoc stage to the result.

Measuring coverage

AFL pioneered the use of binned hitcounts for measuring code coverage. [27] The author claims that this technique mitigates path explosion. [28] [29]

Conceptually, AFL counts the number of times a given execution of the target traverses each edge in the target's control-flow graph; the documentation refers to these edges as tuples and the counts as hitcounts. At the end of the execution, the hitcounts are binned or bucketed into the following eight buckets: 1, 2, 3, 4–7, 8–15, 16–31, 32–127, and 128 and greater. AFL maintains a global set of (tuple, binned count) pairs that have been produced by any execution thus far. An input is considered "interesting" and is added to the queue if it produces a (tuple, binned count) pair that is not yet in the global set.

In practice, the hitcounts are collected and processed using an efficient but lossy scheme. The compile-time instrumentation injects code that is conceptually similar to the following at each branch in the control-flow graph of the target program: [30]

cur_location=<COMPILE_TIME_RANDOM>;shared_mem[cur_location^prev_location]++;prev_location=cur_location>>1;

where <COMPILE_TIME_RANDOM> is a random integer and shared_mem is a 64 kilobyte region of memory shared between the fuzzer and the target.

This representation is more fine-grained (distinguishes between more executions) than simple block or statement coverage, but still allows for a linear-time "interestingness" test.

Minimization

On the assumption that smaller inputs take less time to execute, AFL attempts to minimize or trim the test cases in the queue. [22] [31] Trimming works by removing blocks from the input; if the trimmed input still results in the same coverage (see #Measuring coverage), then the original input is discarded and the trimmed input is saved in the queue.

Scheduling

AFL selects a subset of favored inputs from the queue, non-favored inputs are skipped with some probability. [32] [27]

Features

Performance features

One of the challenges American fuzzy lop had to solve involved an efficient spawning of hundreds of processes per second. Apart from the original engine that spawned every process from scratch, American fuzzy lop offers the default engine that relies heavily on the fork system call. [33] [27] This can further be sped up by leveraging LLVM deferred fork server mode or the similar persistent mode, but this comes at the cost of having to modify the tested program. [34] Also, American fuzzy lop supports fuzzing the same program over the network.

User interface

American fuzzy lop features a colorful command line interface that displays real-time statistics about the fuzzing process. Various settings may be triggered by either command line options or environment variables. Apart from that, programs may read runtime statistics from files in a machine-readable format.

Utility programs

In addition to afl-fuzz and tools that can be used for binary instrumentation, American fuzzy lop features utility programs meant for monitoring of the fuzzing process. Apart from that, there is afl-cmin and afl-tmin, which can be used for test case and test corpus minimization. This can be useful when the test cases generated by afl-fuzz would be used by other fuzzers.

Forks

AFL has been forked many times in order to examine new fuzzing techniques, or to apply fuzzing to different kinds of programs. A few notable forks include:

AFL++

AFL++
Initial release2.52c / June 5, 2019;4 years ago (2019-06-05)
Stable release
4.08c / August 10, 2023;7 months ago (2023-08-10) [41]
Repository
Website aflplus.plus   OOjs UI icon edit-ltr-progressive.svg

AFL++ (AFLplusplus) [42] is a community-maintained fork of AFL created due to the relative inactivity of Google's upstream AFL development since September 2017. It includes new features and speedups. [43]

Google's OSS-Fuzz initiative, which provides free fuzzing services to open source software, replaced its AFL option with AFL++ in January 2021. [44] [45]

Related Research Articles

In software engineering, code coverage is a percentage measure of the degree to which the source code of a program is executed when a particular test suite is run. A program with high test coverage has more of its source code executed during testing, which suggests it has a lower chance of containing undetected software bugs compared to a program with low test coverage. Many different metrics can be used to calculate test coverage. Some of the most basic are the percentage of program subroutines and the percentage of program statements called during execution of the test suite.

Software testing is the act of examining the artifacts and the behavior of the software under test by verification and validation. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation. Test techniques include, but are not limited to:

In computer science, symbolic execution (also symbolic evaluation or symbex) is a means of analyzing a program to determine what inputs cause each part of a program to execute. An interpreter follows the program, assuming symbolic values for inputs rather than obtaining actual inputs as normal execution of the program would. It thus arrives at expressions in terms of those symbols for expressions and variables in the program, and constraints in terms of those symbols for the possible outcomes of each conditional branch. Finally, the possible inputs that trigger a branch can be determined by solving the constraints.

An edge case is a problem or situation that occurs only at an extreme operating parameter. For example, a stereo speaker might noticeably distort audio when played at maximum volume, even in the absence of any other extreme setting or condition.

In software testing, test automation is the use of software separate from the software being tested to control the execution of tests and the comparison of actual outcomes with predicted outcomes. Test automation can automate some repetitive but necessary tasks in a formalized testing process already in place, or perform additional testing that would be difficult to do manually. Test automation is critical for continuous delivery and continuous testing.

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

Quantum programming is the process of designing or assembling sequences of instructions, called quantum circuits, using gates, switches, and operators to manipulate a quantum system for a desired outcome or results of a given experiment. Quantum circuit algorithms can be implemented on integrated circuits, conducted with instrumentation, or written in a programming language for use with a quantum computer or a quantum processor.

Mutation testing is used to design new software tests and evaluate the quality of existing software tests. Mutation testing involves modifying a program in small ways. Each mutated version is called a mutant and tests detect and reject mutants by causing the behaviour of the original version to differ from the mutant. This is called killing the mutant. Test suites are measured by the percentage of mutants that they kill. New tests can be designed to kill additional mutants. Mutants are based on well-defined mutation operators that either mimic typical programming errors or force the creation of valuable tests. The purpose is to help the tester develop effective tests or locate weaknesses in the test data used for the program or in sections of the code that are seldom or never accessed during execution. Mutation testing is a form of white-box testing.

Dynamic program analysis is analysis of computer software that involves executing the program in question. Dynamic program analysis includes familiar techniques from software engineering such as unit testing, debugging, and measuring code coverage, but also includes lesser-known techniques like program slicing and invariant inference. Dynamic program analysis is widely applied in security in the form of runtime memory error detection, fuzzing, dynamic symbolic execution, and taint tracking.

In computer science, fault injection is a testing technique for understanding how computing systems behave when stressed in unusual ways. This can be achieved using physical- or software-based means, or using a hybrid approach. Widely studied physical fault injections include the application of high voltages, extreme temperatures and electromagnetic pulses on electronic components, such as computer memory and central processing units. By exposing components to conditions beyond their intended operating limits, computing systems can be coerced into mis-executing instructions and corrupting critical data.

In computing, compiler correctness is the branch of computer science that deals with trying to show that a compiler behaves according to its language specification. Techniques include developing the compiler using formal methods and using rigorous testing on an existing compiler.

Concolic testing is a hybrid software verification technique that performs symbolic execution, a classical technique that treats program variables as symbolic variables, along a concrete execution path. Symbolic execution is used in conjunction with an automated theorem prover or constraint solver based on constraint logic programming to generate new concrete inputs with the aim of maximizing code coverage. Its main focus is finding bugs in real-world software, rather than demonstrating program correctness.

A regular expression denial of service (ReDoS) is an algorithmic complexity attack that produces a denial-of-service by providing a regular expression and/or an input that takes a long time to evaluate. The attack exploits the fact that many regular expression implementations have super-linear worst-case complexity; on certain regex-input pairs, the time taken can grow polynomially or exponentially in relation to the input size. An attacker can thus cause a program to spend substantial time by providing a specially crafted regular expression and/or input. The program will then slow down or become unresponsive.

In computer science, robustness is the ability of a computer system to cope with errors during execution and cope with erroneous input. Robustness can encompass many areas of computer science, such as robust programming, robust machine learning, and Robust Security Network. Formal techniques, such as fuzz testing, are essential to showing robustness since this type of testing involves invalid or unexpected inputs. Alternatively, fault injection can be used to test robustness. Various commercial products perform robustness testing of software analysis.

Metamorphic testing (MT) is a property-based software testing technique, which can be an effective approach for addressing the test oracle problem and test case generation problem. The test oracle problem is the difficulty of determining the expected outcomes of selected test cases or to determine whether the actual outputs agree with the expected outcomes.

<span class="mw-page-title-main">Nim (programming language)</span> Programming language

Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level systems programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.

In computer software development, genetic Improvement is the use of optimisation and machine learning techniques, particularly search-based software engineering techniques such as genetic programming to improve existing software. The improved program need not behave identically to the original. For example, automatic bug fixing improves program code by reducing or eliminating buggy behaviour. In other cases the improved software should behave identically to the old version but is better because, for example: it runs faster, it uses less memory, it uses less energy or it runs on a different type of computer. GI differs from, for example, formal program translation, in that it primarily verifies the behaviour of the new mutant version by running both the new and the old software on test inputs and comparing their output and performance in order to see if the new software can still do what is wanted of the original program and is now better.

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.

Mathias Payer is a Liechtensteinian computer scientist. His research is invested in software and system security. He is Associate Professor at the École Polytechnique Fédérale de Lausanne (EPFL) and head of the HexHive research group.

<span class="mw-page-title-main">OneFuzz</span>

OneFuzz is a cross-platform free and open source fuzz testing framework by Microsoft. The software enables continuous developer-driven fuzz testing to identify weaknesses in computer software prior to release.

References

Notes

  1. "Releases - google/AFL" . Retrieved January 19, 2021 via GitHub.
  2. "Advisory-2015-03-17". x.org.
  3. "NVD - Detail". nist.gov.
  4. "NVD - Detail". nist.gov.
  5. "NVD - Detail". nist.gov.
  6. "CVE - CVE-2014-6278". mitre.org.
  7. "CVE - CVE-2014-8637". mitre.org.
  8. "How to fuzz a server with American Fuzzy Lop". Fastly. July 21, 2015.
  9. "CVE - CVE-2015-5477". mitre.org.
  10. "[Announce] Qt Project Security Advisory - Multiple Vulnerabilities in Qt Image Format Handling". qt-project.org. April 13, 2015.
  11. "How SQLite Is Tested # 4.1.1. SQL Fuzz Using The American Fuzzy Lop Fuzzer". sqlite.org.
  12. Poncelet, Clement; Sagonas, Konstantinos; Tsiftes, Nicolas (January 5, 2023). "So Many Fuzzers, So Little Time✱". Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. ASE '22. New York, NY, USA: Association for Computing Machinery. pp. 1–12. doi:10.1145/3551349.3556946. ISBN   978-1-4503-9475-8. S2CID   253456740.
  13. Fioraldi et al. 2023, p. 2.
  14. Fioraldi, Andrea; Maier, Dominik Christian; Zhang, Dongjia; Balzarotti, Davide (November 7, 2022). "LibAFL". Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. CCS '22. New York, NY, USA: Association for Computing Machinery. pp. 1051–1065. doi:10.1145/3548606.3560602. ISBN   978-1-4503-9450-5. S2CID   253410747.. "The release of AFL marked an important milestone in the area of software security testing, revitalizing fuzzing as a major research topic".
  15. Hazimeh, Ahmad; Herrera, Adrian; Payer, Mathias (June 15, 2021). "Magma: A Ground-Truth Fuzzing Benchmark". Proceedings of the ACM on Measurement and Analysis of Computing Systems. 4 (3): 49:1–49:29. arXiv: 2009.01120 . doi:10.1145/3428334. S2CID   227230949.
  16. Metzman et al. 2021.
  17. Technion. "Fuzzing nginx - Hunting vulnerabilities with afl-fuzz". lolware.net.
  18. Zalewski, Michał (February 27, 2015). "Logo for afl-fuzz". afl-users | Google Groups. Retrieved July 25, 2019.
  19. Fioraldi et al. 2023.
  20. Chen, Peng; Chen, Hao (May 2018). "Angora: Efficient Fuzzing by Principled Search". 2018 IEEE Symposium on Security and Privacy (SP). pp. 711–725. doi:10.1109/SP.2018.00046. ISBN   978-1-5386-4353-2. S2CID   3729194.
  21. "Motivation behind AFL — AFL 2.53b documentation". afl-1.readthedocs.io. Retrieved February 26, 2023.
  22. 1 2 3 4 Fioraldi et al. 2023, p. 6.
  23. "Binary fuzzing strategies: what works, what doesn't". lcamtuf.blogspot.com. August 8, 2014.
  24. "AFL User Guide — AFL 2.53b documentation". afl-1.readthedocs.io. Retrieved February 26, 2023.
  25. "Finding bugs in SQLite, the easy way". lcamtuf.blogspot.com. April 14, 2015.
  26. Manès, Valentin J.M.; Han, HyungSeok; Han, Choongwoo; Cha, Sang Kil; Egele, Manuel; Schwartz, Edward J.; Woo, Maverick (November 2021). "The Art, Science, and Engineering of Fuzzing: A Survey". IEEE Transactions on Software Engineering. 47 (11): 2312–2331. arXiv: 1812.00140 . doi:10.1109/TSE.2019.2946563. ISSN   1939-3520. S2CID   102351047.
  27. 1 2 3 Fioraldi et al. 2023, p. 5.
  28. "Technical "whitepaper" for afl-fuzz".
  29. "More about AFL — AFL 2.53b documentation". afl-1.readthedocs.io. Retrieved February 27, 2023. "This approach allows for a very fine-grained and long-term exploration of program state while not having to perform any computationally intensive and fragile global comparisons of complex execution traces, and while avoiding the scourge of path explosion."
  30. "More about AFL — AFL 2.53b documentation". afl-1.readthedocs.io. Retrieved February 27, 2023.
  31. "More about AFL — AFL 2.53b documentation". afl-1.readthedocs.io. Retrieved February 27, 2023.
  32. "More about AFL — AFL 2.53b documentation". afl-1.readthedocs.io. Retrieved February 27, 2023.
  33. "Fuzzing random programs without execve()". lcamtuf.blogspot.com. October 14, 2014.
  34. "New in AFL: persistent mode". lcamtuf's blog. June 11, 2015.
  35. Lyu, Chenyang; Ji, Shouling; Zhang, Chao; Li, Yuwei; Lee, Wei-Han; Song, Yu; Beyah, Raheem (2019). {MOPT}: Optimized Mutation Scheduling for Fuzzers. pp. 1949–1966. ISBN   978-1-939133-06-9.
  36. Böhme, Marcel; Pham, Van-Thuan; Roychoudhury, Abhik (May 2019). "Coverage-Based Greybox Fuzzing as Markov Chain". IEEE Transactions on Software Engineering. 45 (5): 489–506. doi:10.1109/TSE.2017.2785841. ISSN   1939-3520.
  37. Pham, Van-Thuan; Böhme, Marcel; Santosa, Andrew E.; Căciulescu, Alexandru Răzvan; Roychoudhury, Abhik (September 2021). "Smart Greybox Fuzzing". IEEE Transactions on Software Engineering. 47 (9): 1980–1997. doi:10.1109/TSE.2019.2941681. ISSN   1939-3520. S2CID   53721813.
  38. Böhme, Marcel; Pham, Van-Thuan; Nguyen, Manh-Dung; Roychoudhury, Abhik (October 30, 2017). "Directed Greybox Fuzzing". Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. CCS '17. New York, NY, USA: Association for Computing Machinery. pp. 2329–2344. doi:10.1145/3133956.3134020. ISBN   978-1-4503-4946-8. S2CID   29430742.
  39. Poeplau, Sebastian; Francillon, Aurélien (2020). Symbolic execution with {SymCC}: Don't interpret, compile!. pp. 181–198. ISBN   978-1-939133-17-5.
  40. WinAFL, Google Project Zero, February 23, 2023, retrieved February 26, 2023
  41. "Releases - AFLplusplus/AFLplusplus" . Retrieved November 1, 2023 via GitHub.
  42. Fioraldi, Andrea; Maier, Dominik; Eißfeldt, Heiko; Heuse, Marc (August 2020). AFL++: Combining incremental steps of fuzzing research. 14th USENIX Workshop on Offensive Technologies (WOOT 20).
  43. "The AFL++ fuzzing framework". AFLplusplus.
  44. metzman, jonathan. "[afl++] Use AFL++ instead of AFL for fuzzing. by jonathanmetzman · Pull Request #5046 · google/oss-fuzz". GitHub.
  45. Metzman et al. 2021, p. 1394.

Sources

Further reading