Concolic testing

Last updated

Concolic testing (a portmanteau of concrete and symbolic, also known as dynamic symbolic execution) is a hybrid software verification technique that performs symbolic execution, a classical technique that treats program variables as symbolic variables, along a concrete execution (testing on particular inputs) path. Symbolic execution is used in conjunction with an automated theorem prover or constraint solver based on constraint logic programming to generate new concrete inputs (test cases) with the aim of maximizing code coverage. Its main focus is finding bugs in real-world software, rather than demonstrating program correctness.

Contents

A description and discussion of the concept was introduced in "DART: Directed Automated Random Testing" by Patrice Godefroid, Nils Klarlund, and Koushik Sen. [1] The paper "CUTE: A concolic unit testing engine for C", [2] by Koushik Sen, Darko Marinov, and Gul Agha, further extended the idea to data structures, and first coined the term concolic testing. Another tool, called EGT (renamed to EXE and later improved and renamed to KLEE), based on similar ideas was independently developed by Cristian Cadar and Dawson Engler in 2005, and published in 2005 and 2006. [3] PathCrawler [4] [5] first proposed to perform symbolic execution along a concrete execution path, but unlike concolic testing PathCrawler does not simplify complex symbolic constraints using concrete values. These tools (DART and CUTE, EXE) applied concolic testing to unit testing of C programs and concolic testing was originally conceived as a white box improvement upon established random testing methodologies. The technique was later generalized to testing multithreaded Java programs with jCUTE, [6] and unit testing programs from their executable codes (tool OSMOSE). [7] It was also combined with fuzz testing and extended to detect exploitable security issues in large-scale x86 binaries by Microsoft Research's SAGE. [8] [9]

The concolic approach is also applicable to model checking. In a concolic model checker, the model checker traverses states of the model representing the software being checked, while storing both a concrete state and a symbolic state. The symbolic state is used for checking properties on the software, while the concrete state is used to avoid reaching unreachable state. One such tool is ExpliSAT by Sharon Barner, Cindy Eisner, Ziv Glazberg, Daniel Kroening and Ishai Rabinovitz [10]

Birth of concolic testing

Implementation of traditional symbolic execution based testing requires the implementation of a full-fledged symbolic interpreter for a programming language. Concolic testing implementors noticed that implementation of full-fledged symbolic execution can be avoided if symbolic execution can be piggy-backed with the normal execution of a program through instrumentation. This idea of simplifying implementation of symbolic execution gave birth to concolic testing.

Development of SMT solvers

An important reason for the rise of concolic testing (and more generally, symbolic-execution based analysis of programs) in the decade since it was introduced in 2005 is the dramatic improvement in the efficiency and expressive power of SMT Solvers. The key technical developments that lead to the rapid development of SMT solvers include combination of theories, lazy solving, DPLL(T) and the huge improvements in the speed of SAT solvers. SMT solvers that are particularly tuned for concolic testing include Z3, STP, Z3str2, and Boolector.

Example

Consider the following simple example, written in C:

voidf(intx,inty){intz=2*y;if(x==100000){if(x<z){assert(0);/* error */}}}
Execution path tree for this example. Three tests are generated corresponding to the three leaf nodes in the tree, and three execution paths in the program. Concolic testing example.svg
Execution path tree for this example. Three tests are generated corresponding to the three leaf nodes in the tree, and three execution paths in the program.

Simple random testing, trying random values of x and y, would require an impractically large number of tests to reproduce the failure.

We begin with an arbitrary choice for x and y, for example x = y = 1. In the concrete execution, line 2 sets z to 2, and the test in line 3 fails since 1 ≠ 100000. Concurrently, the symbolic execution follows the same path but treats x and y as symbolic variables. It sets z to the expression 2y and notes that, because the test in line 3 failed, x ≠ 100000. This inequality is called a path condition and must be true for all executions following the same execution path as the current one.

Since we'd like the program to follow a different execution path on the next run, we take the last path condition encountered, x ≠ 100000, and negate it, giving x = 100000. An automated theorem prover is then invoked to find values for the input variables x and y given the complete set of symbolic variable values and path conditions constructed during symbolic execution. In this case, a valid response from the theorem prover might be x = 100000, y = 0.

Running the program on this input allows it to reach the inner branch on line 4, which is not taken since 100000 (x) is not less than 0 (z = 2y). The path conditions are x = 100000 and xz. The latter is negated, giving x < z. The theorem prover then looks for x, y satisfying x = 100000, x < z, and z = 2y; for example, x = 100000, y = 50001. This input reaches the error.

Algorithm

Essentially, a concolic testing algorithm operates as follows:

  1. Classify a particular set of variables as input variables. These variables will be treated as symbolic variables during symbolic execution. All other variables will be treated as concrete values.
  2. Instrument the program so that each operation which may affect a symbolic variable value or a path condition is logged to a trace file, as well as any error that occurs.
  3. Choose an arbitrary input to begin with.
  4. Execute the program.
  5. Symbolically re-execute the program on the trace, generating a set of symbolic constraints (including path conditions).
  6. Negate the last path condition not already negated in order to visit a new execution path. If there is no such path condition, the algorithm terminates.
  7. Invoke an automated satisfiability solver on the new set of path conditions to generate a new input. If there is no input satisfying the constraints, return to step 6 to try the next execution path.
  8. Return to step 4.

There are a few complications to the above procedure:

Commercial success

Symbolic-execution based analysis and testing, in general, has witnessed a significant level of interest from industry [ citation needed ]. Perhaps the most famous commercial tool that uses dynamic symbolic execution (aka concolic testing) is the SAGE tool from Microsoft. The KLEE and S2E tools (both of which are open-source tools, and use the STP constraint solver) are widely used in many companies including Micro Focus Fortify, NVIDIA, and IBM [ citation needed ]. Increasingly these technologies are being used by many security companies and hackers alike to find security vulnerabilities.

Limitations

Concolic testing has a number of limitations:

Tools

Many tools, notably DART and SAGE, have not been made available to the public at large. Note however that for instance SAGE is "used daily" for internal security testing at Microsoft. [12]

Related Research Articles

In logic and computer science, the Boolean satisfiability problem (sometimes called propositional satisfiability problem and abbreviated SATISFIABILITY, SAT or B-SAT) is the problem of determining if there exists an interpretation that satisfies a given Boolean formula. In other words, it asks whether the variables of a given Boolean formula can be consistently replaced by the values TRUE or FALSE in such a way that the formula evaluates to TRUE. If this is the case, the formula is called satisfiable. On the other hand, if no such assignment exists, the function expressed by the formula is FALSE for all possible variable assignments and the formula is unsatisfiable. For example, the formula "a AND NOT b" is satisfiable because one can find the values a = TRUE and b = FALSE, which make (a AND NOT b) = TRUE. In contrast, "a AND NOT a" is unsatisfiable.

In computer science, code coverage is a percentage measure of the degree to which the source code of a program is executed when a particular test suite is run. A program with high test coverage has more of its source code executed during testing, which suggests it has a lower chance of containing undetected software bugs compared to a program with low test coverage. Many different metrics can be used to calculate test coverage. Some of the most basic are the percentage of program subroutines and the percentage of program statements called during execution of the test suite.

In computer science, abstract interpretation is a theory of sound approximation of the semantics of computer programs, based on monotonic functions over ordered sets, especially lattices. It can be viewed as a partial execution of a computer program which gains information about its semantics without performing all the calculations.

In computer science, formal methods are mathematically rigorous techniques for the specification, development, analysis, and verification of software and hardware systems. The use of formal methods for software and hardware design is motivated by the expectation that, as in other engineering disciplines, performing appropriate mathematical analysis can contribute to the reliability and robustness of a design.

<span class="mw-page-title-main">Race condition</span> When a systems behavior depends on timing of uncontrollable events

A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events. It becomes a bug when one or more of the possible behaviors is undesirable.

In computer science, symbolic execution (also symbolic evaluation or symbex) is a means of analyzing a program to determine what inputs cause each part of a program to execute. An interpreter follows the program, assuming symbolic values for inputs rather than obtaining actual inputs as normal execution of the program would. It thus arrives at expressions in terms of those symbols for expressions and variables in the program, and constraints in terms of those symbols for the possible outcomes of each conditional branch. Finally, the possible inputs that trigger a branch can be determined by solving the constraints.

<span class="mw-page-title-main">Model-based testing</span>

Model-based testing is an application of model-based design for designing and optionally also executing artifacts to perform software testing or system testing. Models can be used to represent the desired behavior of a system under test (SUT), or to represent testing strategies and a test environment. The picture on the right depicts the former approach.

<span class="mw-page-title-main">Fuzzing</span> Automated software testing technique

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

In computer science and mathematical logic, satisfiability modulo theories (SMT) is the problem of determining whether a mathematical formula is satisfiable. It generalizes the Boolean satisfiability problem (SAT) to more complex formulas involving real numbers, integers, and/or various data structures such as lists, arrays, bit vectors, and strings. The name is derived from the fact that these expressions are interpreted within ("modulo") a certain formal theory in first-order logic with equality. SMT solvers are tools which aim to solve the SMT problem for a practical subset of inputs. SMT solvers such as Z3 and cvc5 have been used as a building block for a wide range of applications across computer science, including in automated theorem proving, program analysis, program verification, and software testing.

Dynamic program analysis is analysis of computer software that involves executing the program in question. Dynamic program analysis includes familiar techniques from software engineering such as unit testing, debugging, and measuring code coverage, but also includes lesser-known techniques like program slicing and invariant inference. Dynamic program analysis is widely applied in security in the form of runtime memory error detection, fuzzing, dynamic symbolic execution, and taint tracking.

In software engineering, graphical user interface testing is the process of testing a product's graphical user interface (GUI) to ensure it meets its specifications. This is normally done through the use of a variety of test cases.

In computer science and software engineering, Alloy is a declarative specification language for expressing complex structural constraints and behavior in a software system. Alloy provides a simple structural modeling tool based on first-order logic. Alloy is targeted at the creation of micro-models that can then be automatically checked for correctness. Alloy specifications can be checked using the Alloy Analyzer.

<span class="mw-page-title-main">KeY</span>

The KeY tool is used in formal verification of Java programs. It accepts specifications written in the Java Modeling Language to Java source files. These are transformed into theorems of dynamic logic and then compared against program semantics that are likewise defined in terms of dynamic logic. KeY is significantly powerful in that it supports both interactive and fully automated correctness proofs. Failed proof attempts can be used for a more efficient debugging or verification-based testing. There have been several extensions to KeY in order to apply it to the verification of C programs or hybrid systems. KeY is jointly developed by Karlsruhe Institute of Technology, Germany; Technische Universität Darmstadt, Germany; and Chalmers University of Technology in Gothenburg, Sweden and is licensed under the GPL.

Sequential minimal optimization (SMO) is an algorithm for solving the quadratic programming (QP) problem that arises during the training of support-vector machines (SVM). It was invented by John Platt in 1998 at Microsoft Research. SMO is widely used for training support vector machines and is implemented by the popular LIBSVM tool. The publication of the SMO algorithm in 1998 has generated a lot of excitement in the SVM community, as previously available methods for SVM training were much more complex and required expensive third-party QP solvers.

ISP is a tool for the formal verification of MPI programs developed within the School of Computing at the University of Utah. Like model checkers, such as SPIN, ISP verifies the complete state space of a system for a set of safety properties. However, unlike model checkers, ISP performs code level verification. This means that the tool verifies all relevant interleavings of a concurrent program by replaying the actual program code without building verification models. This idea was pioneered in a number of tools, notably by Godefroid, in his VeriSoft tool. Other recent tools of this genre include the Java Pathfinder, Microsoft's CHESS tool, and MODIST. Relevant interleavings are computed using a customized dynamic partial order reduction algorithm called POE.

<span class="mw-page-title-main">Device driver synthesis and verification</span>

Device drivers are programs which allow software or higher-level computer programs to interact with a hardware device. These software components act as a link between the devices and the operating systems, communicating with each of these systems and executing commands. They provide an abstraction layer for the software above and also mediate the communication between the operating system kernel and the devices below.

<span class="mw-page-title-main">American fuzzy lop (fuzzer)</span> Software fuzzer that employs genetic algorithms

American fuzzy lop (AFL), stylized in lowercase as american fuzzy lop, is a free software fuzzer that employs genetic algorithms in order to efficiently increase code coverage of the test cases. So far it has detected dozens of significant software bugs in major free software projects, including X.Org Server, PHP, OpenSSL, pngcrush, bash, Firefox, BIND, Qt, and SQLite.

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.

Differential testing, also known as differential fuzzing, is a popular software testing technique that attempts to detect bugs, by providing the same input to a series of similar applications, and observing differences in their execution. Differential testing complements traditional software testing, because it is well-suited to find semantic or logic bugs that do not exhibit explicit erroneous behaviors like crashes or assertion failures. Differential testing is sometimes called back-to-back testing.

Z3, also known as the Z3 Theorem Prover, is a satisfiability modulo theories (SMT) solver developed by Microsoft.

References

  1. Patrice Godefroid; Nils Klarlund; Koushik Sen (2005). "DART: Directed Automated Random Testing" (PDF). Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation. New York, NY: ACM. pp. 213–223. ISSN   0362-1340. Archived from the original (PDF) on 2008-08-29. Retrieved 2009-11-09.
  2. Koushik Sen; Darko Marinov; Gul Agha (2005). "CUTE: a concolic unit testing engine for C" (PDF). Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering. New York, NY: ACM. pp. 263–272. ISBN   1-59593-014-0. Archived from the original (PDF) on 2010-06-29. Retrieved 2009-11-09.
  3. Cristian Cadar; Vijay Ganesh; Peter Pawloski; David L. Dill; Dawson Engler (2006). "EXE: Automatically Generating Inputs of Death" (PDF). Proceedings of the 13th International Conference on Computer and Communications Security (CCS 2006). Alexandria, VA, USA: ACM.
  4. Nicky Williams; Bruno Marre; Patricia Mouy (2004). "On-the-Fly Generation of K-Path Tests for C Functions". Proceedings of the 19th IEEE International Conference on Automated Software Engineering (ASE 2004), 20–25 September 2004, Linz, Austria. IEEE Computer Society. pp. 290–293. ISBN   0-7695-2131-2.
  5. Nicky Williams; Bruno Marre; Patricia Mouy; Muriel Roger (2005). "PathCrawler: Automatic Generation of Path Tests by Combining Static and Dynamic Analysis". Dependable Computing - EDCC-5, 5th European Dependable Computing Conference, Budapest, Hungary, April 20–22, 2005, Proceedings. Springer. pp. 281–292. ISBN   3-540-25723-3.
  6. Koushik Sen; Gul Agha (August 2006). "CUTE and jCUTE : Concolic Unit Testing and Explicit Path Model-Checking Tools". Computer Aided Verification: 18th International Conference, CAV 2006, Seattle, WA, USA, August 17–20, 2006, Proceedings. Springer. pp. 419–423. ISBN   978-3-540-37406-0. Archived from the original on 2010-06-29. Retrieved 2009-11-09.
  7. Sébastien Bardin; Philippe Herrmann (April 2008). "Structural Testing of Executables" (PDF). Proceedings of the 1st IEEE International Conference on Software Testing, Verification, and Validation (ICST 2008), Lillehammer, Norway. IEEE Computer Society. pp. 22–31. ISBN   978-0-7695-3127-4.,
  8. Patrice Godefroid; Michael Y. Levin; David Molnar (2007). Automated Whitebox Fuzz Testing (PDF) (Technical report). Microsoft Research. TR-2007-58.
  9. Patrice Godefroid (2007). "Random testing for security: blackbox vs. whitebox fuzzing" (PDF). Proceedings of the 2nd international workshop on Random testing: co-located with the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2007). New York, NY: ACM. p. 1. ISBN   978-1-59593-881-7 . Retrieved 2009-11-09.
  10. Sharon Barner, Cindy Eisner, Ziv Glazberg, Daniel Kroening, Ishai Rabinovitz: ExpliSAT: Guiding SAT-Based Software Verification with Explicit States. Haifa Verification Conference 2006: 138-154
  11. "Software".
  12. SAGE team (2009). "Microsoft PowerPoint - SAGE-in-one-slide" (PDF). Microsoft Research. Retrieved 2009-11-10.