Differential testing

Last updated October 17, 2024

Differential testing,^[1] also known as differential fuzzing, is a software testing technique that detect bugs, by providing the same input to a series of similar applications (or to different implementations of the same application), and observing differences in their execution. Differential testing complements traditional software testing because it is well-suited to find semantic or logic bugs that do not exhibit explicit erroneous behaviors like crashes or assertion failures. Differential testing is also called back-to-back testing.

Application domains

Differential testing has been used to find semantic bugs successfully in diverse domains like SSL/TLS implementations,^[2]^[3]^[4]^[5] C compilers,^[6] JVM implementations,^[7] Web application firewalls,^[8] security policies for APIs,^[9] antivirus software,^[4]^[10] and file systems.^[11] Differential testing has also been used for automated fingerprint generation from different network protocol implementations.^[12]

Input generation

Unguided

Unguided differential testing tools generate test inputs independently across iterations without considering the test program’s behavior on past inputs. Such an input generation process does not use any information from past inputs and essentially creates new inputs at random from a prohibitively large input space. This can make the testing process highly inefficient, since large numbers of inputs need to be generated to find a single bug.

An example of a differential testing system that performs unguided input generation is "Frankencerts".^[2] This work synthesizes Frankencerts by randomly combining parts of real certificates. It uses syntactically valid certificates to test for semantic violations of SSL/TLS certificate validation across multiple implementations. However, since the creation and selection of Frankencerts are completely unguided, it is significantly inefficient compared to the guided tools.

Guided

Guided input generation process aims to minimize the number of inputs needed to find each bug by taking program behavior information for past inputs into account.

Domain-specific evolutionary guidance

An example of a differential testing system that performs domain-specific coverage-guided input generation is Mucerts.^[3] Mucerts relies on the knowledge of the partial grammar of the X.509 certificate format and uses a stochastic sampling algorithm to drive its input generation while tracking the program coverage.

Another line of research builds on the observation that the problem of new input generation from existing inputs can be modeled as a stochastic process. An example of a differential testing tool that uses such a stochastic process modeling for input generation is Chen et al.’s tool.^[7] It performs differential testing of Java virtual machines (JVM) using Markov chain Monte Carlo (MCMC) sampling for input generation. It uses custom domain-specific mutations by leveraging detailed knowledge of the Java class file format.

Domain-independent evolutionary guidance

NEZHA^[4] is an example of a differential testing tool that has a path selection mechanism geared towards domain-independent differential testing. It uses specific metrics (dubbed as delta-diversity) that summarize and quantify the observed asymmetries between the behaviors of multiple test applications. Such metrics that promote the relative diversity of observed program behavior have shown to be effective in applying differential testing in a domain-independent and black-box manner.

Automata-learning-based guidance

For applications, such as cross-site scripting (XSS) filters and X.509 certificate hostname verification, which can be modeled accurately with finite-state automata (FSA), counter-example-driven FSA learning techniques can be used to generate inputs that are more likely to find bugs.^[8]^[5]

Symbolic-execution-based guidance

Symbolic execution ^[13] is a white-box technique that executes a program symbolically, computes constraints along different paths, and uses a constraint solver to generate inputs that satisfy the collected constraints along each path. Symbolic execution can also be used to generate input for differential testing.^[12]^[14]

The inherent limitation of symbolic-execution-assisted testing tools—path explosion and scalability—is magnified especially in the context of differential testing where multiple test programs are used. Therefore, it is very hard to scale symbolic execution techniques to perform differential testing of multiple large programs.

Related Research Articles

Transport Layer Security (TLS) is a cryptographic protocol designed to provide communications security over a computer network, such as the Internet. The protocol is widely used in applications such as email, instant messaging, and voice over IP, but its use in securing HTTPS remains the most publicly visible.

Unit testing, a.k.a. component or module testing, is a form of software testing by which isolated source code is tested to validate expected behavior.

In cryptography, Camellia is a symmetric key block cipher with a block size of 128 bits and key sizes of 128, 192 and 256 bits. It was jointly developed by Mitsubishi Electric and NTT of Japan. The cipher has been approved for use by the ISO/IEC, the European Union's NESSIE project and the Japanese CRYPTREC project. The cipher has security levels and processing abilities comparable to the Advanced Encryption Standard.

In computer science, symbolic execution (also symbolic evaluation or symbex) is a means of analyzing a program to determine what inputs cause each part of a program to execute. An interpreter follows the program, assuming symbolic values for inputs rather than obtaining actual inputs as normal execution of the program would. It thus arrives at expressions in terms of those symbols for expressions and variables in the program, and constraints in terms of those symbols for the possible outcomes of each conditional branch. Finally, the possible inputs that trigger a branch can be determined by solving the constraints.

The security of cryptographic systems depends on some secret data that is known to authorized persons but unknown and unpredictable to others. To achieve this unpredictability, some randomization is typically employed. Modern cryptographic protocols often require frequent generation of random quantities. Cryptographic attacks that subvert or exploit weaknesses in this process are known as random number generator attacks.

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

Java Pathfinder (JPF) is a system to verify executable Java bytecode programs. JPF was developed at the NASA Ames Research Center and open sourced in 2005. The acronym JPF is not to be confused with the unrelated Java Plugin Framework project.

Simon S. Lam is an American computer scientist and Internet pioneer. He retired in 2018 from The University of Texas at Austin as Professor Emeritus and Regents' Chair Emeritus in Computer Science #1. He made seminal and important contributions to transport layer security, packet network verification, as well as network protocol design, verification, and performance analysis.

In computer programming jargon, a heisenbug is a software bug that seems to disappear or alter its behavior when one attempts to study it. The term is a pun on the name of Werner Heisenberg, the physicist who first asserted the observer effect of quantum mechanics, which states that the act of observing a system inevitably alters its state. In electronics, the traditional term is probe effect, where attaching a test probe to a device changes its behavior.

Dynamic program analysis is the act of analyzing software that involves executing a program – as opposed to static program analysis, which does not execute it.

Search-based software engineering (SBSE) applies metaheuristic search techniques such as genetic algorithms, simulated annealing and tabu search to software engineering problems. Many activities in software engineering can be stated as optimization problems. Optimization techniques of operations research such as linear programming or dynamic programming are often impractical for large scale software engineering problems because of their computational complexity or their assumptions on the problem structure. Researchers and practitioners use metaheuristic search techniques, which impose little assumptions on the problem structure, to find near-optimal or "good-enough" solutions.

In computing, compiler correctness is the branch of computer science that deals with trying to show that a compiler behaves according to its language specification. Techniques include developing the compiler using formal methods and using rigorous testing on an existing compiler.

In engineering, debugging is the process of finding the root cause of and workarounds and possible fixes for bugs.

Concolic testing is a hybrid software verification technique that performs symbolic execution, a classical technique that treats program variables as symbolic variables, along a concrete execution path. Symbolic execution is used in conjunction with an automated theorem prover or constraint solver based on constraint logic programming to generate new concrete inputs with the aim of maximizing code coverage. Its main focus is finding bugs in real-world software, rather than demonstrating program correctness.

American Fuzzy Lop (AFL), stylized in all lowercase as american fuzzy lop, is a free software fuzzer that employs genetic algorithms in order to efficiently increase code coverage of the test cases. So far it has detected dozens of significant software bugs in major free software projects, including X.Org Server, PHP, OpenSSL, pngcrush, bash, Firefox, BIND, Qt, and SQLite.

EvoSuite is a tool that automatically generates unit tests for Java software. EvoSuite uses an evolutionary algorithm to generate JUnit tests. EvoSuite can be run from the command line, and it also has plugins to integrate it in Maven, IntelliJ and Eclipse. EvoSuite has been used on more than a hundred open-source software and several industrial systems, finding thousands of potential bugs.

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.

Dawson R. Engler is an American computer scientist and an associate professor of computer science and electrical engineering at Stanford University.

Secure Network Programming (SNP) is a prototype of the first Secure Sockets Layer, designed and built in 1993 by the Networking Research Laboratory at the University of Texas at Austin, led by Simon S. Lam. This work was published in the 1994 USENIX Summer Technical Conference. For this project, the authors won the 2004 ACM Software System Award.

Static application security testing (SAST) is used to secure software by reviewing the source code of the software to identify sources of vulnerabilities. Although the process of statically analyzing the source code has existed as long as computers have existed, the technique spread to security in the late 90s and the first public discussion of SQL injection in 1998 when Web applications integrated new technologies like JavaScript and Flash.

References

↑ William M. McKeeman, “Differential testing for software,” Digital Technical Journal, vol. 10, no. 1, pp. 100–107, 1998.
1 2 C. Brubaker, S. Jana, B. Ray, S. Khurshid, and V. Shmatikov, “Using frankencerts for automated adversarial testing of certificate validation in SSL/TLS implementations,” in Proceedings of the 2014 IEEE Symposium on Security and Privacy (S&P). IEEE Computer Society, 2014, pp. 114–129.
1 2 Y. Chen and Z. Su, “Guided differential testing of certificate validation in SSL/TLS implementations,” in Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (FSE). ACM, 2015, pp. 793– 804.
1 2 3 Petsios, T., Tang, A., Stolfo, S., Keromytis, A. D., & Jana, S. (2017, May). NEZHA: Efficient Domain-Independent Differential Testing. In Proceedings of the 38th IEEE Symposium on Security & Privacy,(San Jose, CA).
1 2 S. Sivakorn, G. Argyros, K. Pei, A. D. Keromytis and S. Jana, "HVLearn: Automated Black-Box Analysis of Hostname Verification in SSL/TLS Implementations," 2017 IEEE Symposium on Security and Privacy (S&P), San Jose, CA, USA, 2017, pp. 521–538.
↑ X. Yang, Y. Chen, E. Eide, and J. Regehr, “Finding and understanding bugs in C compilers,” in Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 2011, pp. 283–294.
1 2 Y. Chen, T. Su, C. Sun, Z. Su, and J. Zhao, “Coverage-directed differential testing of JVM implementations,” in Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 2016, pp. 85–99.
1 2 G. Argyros, I. Stais, S. Jana, A. D. Keromytis, and A. Kiayias, “SFADiff: Automated evasion attacks and fingerprinting using black-box differential automata learning,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2016, pp. 1690–1701.
↑ V. Srivastava, M. D. Bond, K. S. McKinley, and V. Shmatikov, “A security policy oracle: Detecting security holes using multiple API implementations,” ACM SIGPLAN Notices, vol. 46, no. 6, pp. 343–354, 2011.
↑ S. Jana and V. Shmatikov, “Abusing file processing in malware detectors for fun and profit,” in Proceedings of the 2012 IEEE Symposium on Security and Privacy (S&P). IEEE Computer Society, 2012, pp. 80– 94.
↑ Y. Liu, M. Adkar, G. Holzmann, G. Kuenning, P. Liu, S.A. Smolka, W. Su, and E. Zadok, “Metis: file system model checking via versatile input and state exploration,” In 22nd USENIX Conference on File and Storage Technologies (FAST ’24). USENIX Association, 2024, pp. 123-140.
1 2 D. Brumley, J. Caballero, Z. Liang, J. Newsome, and D. Song, “Towards automatic discovery of deviations in binary implementations with applications to error detection and fingerprint generation,” in 16th USENIX Security Symposium (USENIX Security ’07). USENIX Association, 2007.
↑ J. C. King, “Symbolic execution and program testing,” Communications of the ACM, vol. 19, no. 7, pp. 385–394, 1976.
↑ D. A. Ramos and D. R. Engler, “Practical, low-effort equivalence verification of real code,” in International Conference on Computer Aided Verification. Springer, 2011, pp. 669–685.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] William M. McKeeman, “Differential testing for software,” Digital Technical Journal, vol. 10, no. 1, pp. 100–107, 1998.

[r2-2] 1 2 C. Brubaker, S. Jana, B. Ray, S. Khurshid, and V. Shmatikov, “Using frankencerts for automated adversarial testing of certificate validation in SSL/TLS implementations,” in Proceedings of the 2014 IEEE Symposium on Security and Privacy (S&P). IEEE Computer Society, 2014, pp. 114–129.

[r3-3] 1 2 Y. Chen and Z. Su, “Guided differential testing of certificate validation in SSL/TLS implementations,” in Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (FSE). ACM, 2015, pp. 793– 804.

[r4-4] 1 2 3 Petsios, T., Tang, A., Stolfo, S., Keromytis, A. D., & Jana, S. (2017, May). NEZHA: Efficient Domain-Independent Differential Testing. In Proceedings of the 38th IEEE Symposium on Security & Privacy,(San Jose, CA).

[r10-5] 1 2 S. Sivakorn, G. Argyros, K. Pei, A. D. Keromytis and S. Jana, "HVLearn: Automated Black-Box Analysis of Hostname Verification in SSL/TLS Implementations," 2017 IEEE Symposium on Security and Privacy (S&P), San Jose, CA, USA, 2017, pp. 521–538.

[r5-6] X. Yang, Y. Chen, E. Eide, and J. Regehr, “Finding and understanding bugs in C compilers,” in Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 2011, pp. 283–294.

[r6-7] 1 2 Y. Chen, T. Su, C. Sun, Z. Su, and J. Zhao, “Coverage-directed differential testing of JVM implementations,” in Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 2016, pp. 85–99.

[r9-8] 1 2 G. Argyros, I. Stais, S. Jana, A. D. Keromytis, and A. Kiayias, “SFADiff: Automated evasion attacks and fingerprinting using black-box differential automata learning,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2016, pp. 1690–1701.

[r7-9] V. Srivastava, M. D. Bond, K. S. McKinley, and V. Shmatikov, “A security policy oracle: Detecting security holes using multiple API implementations,” ACM SIGPLAN Notices, vol. 46, no. 6, pp. 343–354, 2011.

[r11-10] S. Jana and V. Shmatikov, “Abusing file processing in malware detectors for fun and profit,” in Proceedings of the 2012 IEEE Symposium on Security and Privacy (S&P). IEEE Computer Society, 2012, pp. 80– 94.

[r14-11] Y. Liu, M. Adkar, G. Holzmann, G. Kuenning, P. Liu, S.A. Smolka, W. Su, and E. Zadok, “Metis: file system model checking via versatile input and state exploration,” In 22nd USENIX Conference on File and Storage Technologies (FAST ’24). USENIX Association, 2024, pp. 123-140.

[r8-12] 1 2 D. Brumley, J. Caballero, Z. Liang, J. Newsome, and D. Song, “Towards automatic discovery of deviations in binary implementations with applications to error detection and fingerprint generation,” in 16th USENIX Security Symposium (USENIX Security ’07). USENIX Association, 2007.

[r12-13] J. C. King, “Symbolic execution and program testing,” Communications of the ACM, vol. 19, no. 7, pp. 385–394, 1976.

[r13-14] D. A. Ramos and D. R. Engler, “Practical, low-effort equivalence verification of real code,” in International Conference on Computer Aided Verification. Springer, 2011, pp. 669–685.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

v t e Software testing
The "box" approach	Black-box testing All-pairs testing Exploratory testing Fuzz testing Model-based testing Scenario testing Grey-box testing White-box testing API testing Mutation testing Static testing
Testing levels	Acceptance testing Integration testing System testing Unit testing
Testing types, techniques, and tactics	A/B testing Benchmark Compatibility testing Concolic testing Concurrent testing Conformance testing Continuous testing Destructive testing Development testing Differential testing Dynamic program analysis Installation testing Negative testing Random testing Regression testing Security testing Smoke testing (software) Software performance testing Stress testing Symbolic execution Test automation Usability testing
See also	Graphical user interface testing Manual testing Orthogonal array testing Pair testing Soak testing Software reliability testing Stress testing Web testing