Symbolic execution

Last updated March 17, 2024

In computer science, symbolic execution (also symbolic evaluation or symbex) is a means of analyzing a program to determine what inputs cause each part of a program to execute. An interpreter follows the program, assuming symbolic values for inputs rather than obtaining actual inputs as normal execution of the program would. It thus arrives at expressions in terms of those symbols for expressions and variables in the program, and constraints in terms of those symbols for the possible outcomes of each conditional branch. Finally, the possible inputs that trigger a branch can be determined by solving the constraints.

Example

Consider the program below, which reads in a value and fails if the input is 6.

intf(){...y=read();z=y*2;if(z==12){fail();}else{printf("OK");}}

During a normal execution ("concrete" execution), the program would read a concrete input value (e.g., 5) and assign it to y. Execution would then proceed with the multiplication and the conditional branch, which would evaluate to false and print OK.

During symbolic execution, the program reads a symbolic value (e.g., λ) and assigns it to y. The program would then proceed with the multiplication and assign λ * 2 to z. When reaching the if statement, it would evaluate λ * 2 == 12. At this point of the program, λ could take any value, and symbolic execution can therefore proceed along both branches, by "forking" two paths. Each path gets assigned a copy of the program state at the branch instruction as well as a path constraint. In this example, the path constraint is λ * 2 == 12 for the if branch and λ * 2 != 12 for the else branch. Both paths can be symbolically executed independently. When paths terminate (e.g., as a result of executing fail() or simply exiting), symbolic execution computes a concrete value for λ by solving the accumulated path constraints on each path. These concrete values can be thought of as concrete test cases that can, e.g., help developers reproduce bugs. In this example, the constraint solver would determine that in order to reach the fail() statement, λ would need to equal 6.

Limitations

Path explosion

Symbolically executing all feasible program paths does not scale to large programs. The number of feasible paths in a program grows exponentially with an increase in program size and can even be infinite in the case of programs with unbounded loop iterations.^[1] Solutions to the path explosion problem generally use either heuristics for path-finding to increase code coverage,^[2] reduce execution time by parallelizing independent paths,^[3] or by merging similar paths.^[4] One example of merging is veritesting, which "employs static symbolic execution to amplify the effect of dynamic symbolic execution".^[5]

Program-dependent efficiency

Symbolic execution is used to reason about a program path-by-path which is an advantage over reasoning about a program input-by-input as other testing paradigms use (e.g. dynamic program analysis). However, if few inputs take the same path through the program, there is little savings over testing each of the inputs separately.

Memory aliasing

Symbolic execution is harder when the same memory location can be accessed through different names (aliasing). Aliasing cannot always be recognized statically, so the symbolic execution engine can't recognize that a change to the value of one variable also changes the other.^[6]

Arrays

Since an array is a collection of many distinct values, symbolic executors must either treat the entire array as one value or treat each array element as a separate location. The problem with treating each array element separately is that a reference such as "A[i]" can only be specified dynamically, when the value for i has a concrete value.^[6]

Environment interactions

Programs interact with their environment by performing system calls, receiving signals, etc. Consistency problems may arise when execution reaches components that are not under control of the symbolic execution tool (e.g., kernel or libraries). Consider the following example:

intmain(){FILE*fp=fopen("doc.txt");...if(condition){fputs("some data",fp);}else{fputs("some other data",fp);}...data=fgets(...,fp);}

This program opens a file and, based on some condition, writes different kind of data to the file. It then later reads back the written data. In theory, symbolic execution would fork two paths at line 5 and each path from there on would have its own copy of the file. The statement at line 11 would therefore return data that is consistent with the value of "condition" at line 5. In practice, file operations are implemented as system calls in the kernel, and are outside the control of the symbolic execution tool. The main approaches to address this challenge are:

Executing calls to the environment directly. The advantage of this approach is that it is simple to implement. The disadvantage is that the side effects of such calls will clobber all states managed by the symbolic execution engine. In the example above, the instruction at line 11 would return "some datasome other data" or "some other datasomedata" depending on the sequential ordering of the states.

Modeling the environment. In this case, the engine instruments the system calls with a model that simulates their effects and that keeps all the side effects in per-state storage. The advantage is that one would get correct results when symbolically executing programs that interact with the environment. The disadvantage is that one needs to implement and maintain many potentially complex models of system calls. Tools such as KLEE,^[7] Cloud9, and Otter^[8] take this approach by implementing models for file system operations, sockets, IPC, etc.

Forking the entire system state. Symbolic execution tools based on virtual machines solve the environment problem by forking the entire VM state. For example, in S2E^[9] each state is an independent VM snapshot that can be executed separately. This approach alleviates the need for writing and maintaining complex models and allows virtually any program binary to be executed symbolically. However, it has higher memory usage overheads (VM snapshots may be large).

Tools

Tool	Target	URL	Can anybody use it/ Open source/ Downloadable
angr	libVEX based (supporting x86, x86-64, ARM, AARCH64, MIPS, MIPS64, PPC, PPC64, and Java)	http://angr.io/	yes
BE-PUM	x86	https://github.com/NMHai/BE-PUM	yes
BINSEC	x86, ARM, RISC-V (32 bits)	http://binsec.github.io	yes
crucible	LLVM, JVM, etc	https://github.com/GaloisInc/crucible	yes
ExpoSE	JavaScript	https://github.com/ExpoSEJS/ExpoSE	yes
FuzzBALL	VineIL / Native	http://bitblaze.cs.berkeley.edu/fuzzball.html	yes
GenSym	LLVM	https://github.com/Generative-Program-Analysis/GenSym	yes
Jalangi2	JavaScript	https://github.com/Samsung/jalangi2	yes
janala2	Java	https://github.com/ksen007/janala2	yes
JaVerT	JavaScript	https://www.doc.ic.ac.uk/~pg/publications/FragosoSantos2019JaVerT.pdf	yes
JBSE	Java	https://github.com/pietrobraione/jbse	yes
jCUTE	Java	https://github.com/osl/jcute	yes
KeY	Java	http://www.key-project.org/	yes
Kite	LLVM	http://www.cs.ubc.ca/labs/isd/Projects/Kite/	yes
KLEE	LLVM	https://klee.github.io/	yes
Kudzu	JavaScript	http://webblaze.cs.berkeley.edu/2010/kudzu/kudzu.pdf	no
MPro	Ethereum Virtual Machine (EVM) / Native	https://sites.google.com/view/smartcontract-analysis/home	yes
Maat	Ghidra P-code / SLEIGH	https://maat.re/	yes
Manticore	x86-64, ARMv7, Ethereum Virtual Machine (EVM) / Native	https://github.com/trailofbits/manticore/	yes
Mayhem	Binary	http://forallsecure.com	no
Mythril	Ethereum Virtual Machine (EVM) / Native	https://github.com/ConsenSys/mythril	yes
Otter	C	https://bitbucket.org/khooyp/otter/overview	yes
Oyente-NG	Ethereum Virtual Machine (EVM) / Native	http://www.comp.ita.br/labsca/waiaf/papers/RafaelShigemura_paper_16.pdf	no
Pathgrind^[10]	Native 32-bit Valgrind-based	https://github.com/codelion/pathgrind	yes
Pex	.NET Framework	http://research.microsoft.com/en-us/projects/pex/	no
pysymemu	x86-64 / Native	https://github.com/feliam/pysymemu/	yes
Rosette	Dialect of Racket	https://emina.github.io/rosette/	yes
Rubyx	Ruby	http://www.cs.umd.edu/~avik/papers/ssarorwa.pdf	no
S2E	x86, x86-64, ARM / User and kernel-mode binaries	http://s2e.systems/	yes
Symbolic PathFinder (SPF)	Java Bytecode	https://github.com/SymbolicPathFinder	yes
SymDroid	Dalvik bytecode	http://www.cs.umd.edu/~jfoster/papers/symdroid.pdf	no
SymJS	JavaScript	https://core.ac.uk/download/pdf/24067593.pdf	no
SymCC	LLVM	https://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html	yes
Triton	x86, x86-64, ARM and AArch64	https://triton.quarkslab.com	yes
Verifast	C, Java	https://people.cs.kuleuven.be/~bart.jacobs/verifast	yes

Earlier versions of the tools

EXE^[11] is an earlier version of KLEE. The EXE paper can be found here.

History

The concept of symbolic execution was introduced academically in the 1970s with descriptions of: the Select system,^[12] the EFFIGY system,^[13] the DISSECT system,^[14] and Clarke's system.^[15]

Related Research Articles

In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a user of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This mathematical model contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user. For example, a stack has push/pop operations that follow a Last-In-First-Out rule, and can be concretely implemented using either a list or an array. Another example is a set which stores values, without any particular order, and no repeated values. Values themselves are not retrieved from sets, rather one tests a value for membership to obtain a Boolean "in" or "not in".

In software engineering, code coverage is a percentage measure of the degree to which the source code of a program is executed when a particular test suite is run. A program with high test coverage has more of its source code executed during testing, which suggests it has a lower chance of containing undetected software bugs compared to a program with low test coverage. Many different metrics can be used to calculate test coverage. Some of the most basic are the percentage of program subroutines and the percentage of program statements called during execution of the test suite.

Software testing is the act of examining the artifacts and the behavior of the software under test by verification and validation. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation. Test techniques include, but are not limited to:

<span class="mw-page-title-main">Model-based testing</span>

Model-based testing is an application of model-based design for designing and optionally also executing artifacts to perform software testing or system testing. Models can be used to represent the desired behavior of a system under test (SUT), or to represent testing strategies and a test environment. The picture on the right depicts the former approach.

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

Java Pathfinder (JPF) is a system to verify executable Java bytecode programs. JPF was developed at the NASA Ames Research Center and open sourced in 2005. The acronym JPF is not to be confused with the unrelated Java Plugin Framework project.

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

Runtime verification is a computing system analysis and execution approach based on extracting information from a running system and using it to detect and possibly react to observed behaviors satisfying or violating certain properties. Some very particular properties, such as datarace and deadlock freedom, are typically desired to be satisfied by all systems and may be best implemented algorithmically. Other properties can be more conveniently captured as formal specifications. Runtime verification specifications are typically expressed in trace predicate formalisms, such as finite state machines, regular expressions, context-free patterns, linear temporal logics, etc., or extensions of these. This allows for a less ad-hoc approach than normal testing. However, any mechanism for monitoring an executing system is considered runtime verification, including verifying against test oracles and reference implementations. When formal requirements specifications are provided, monitors are synthesized from them and infused within the system by means of instrumentation. Runtime verification can be used for many purposes, such as security or safety policy monitoring, debugging, testing, verification, validation, profiling, fault protection, behavior modification, etc. Runtime verification avoids the complexity of traditional formal verification techniques, such as model checking and theorem proving, by analyzing only one or a few execution traces and by working directly with the actual system, thus scaling up relatively well and giving more confidence in the results of the analysis, at the expense of less coverage. Moreover, through its reflective capabilities runtime verification can be made an integral part of the target system, monitoring and guiding its execution during deployment.

Dynamic program analysis is analysis of computer software that involves executing the program in question. Dynamic program analysis includes familiar techniques from software engineering such as unit testing, debugging, and measuring code coverage, but also includes lesser-known techniques like program slicing and invariant inference. Dynamic program analysis is widely applied in security in the form of runtime memory error detection, fuzzing, dynamic symbolic execution, and taint tracking.

Thread Level Speculation (TLS), also known as Speculative Multi-threading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.

Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is successful, it can be significantly faster than computing new outputs naively. For example, a spreadsheet software package might use incremental computation in its recalculation feature, to update only those cells containing formulas which depend on the changed cells.

Return-oriented programming (ROP) is a computer security exploit technique that allows an attacker to execute code in the presence of security defenses such as executable space protection and code signing.

In computer programming and software development, debugging is the process of finding and resolving bugs within computer programs, software, or systems.

Concolic testing is a hybrid software verification technique that performs symbolic execution, a classical technique that treats program variables as symbolic variables, along a concrete execution path. Symbolic execution is used in conjunction with an automated theorem prover or constraint solver based on constraint logic programming to generate new concrete inputs with the aim of maximizing code coverage. Its main focus is finding bugs in real-world software, rather than demonstrating program correctness.

<span class="mw-page-title-main">Device driver synthesis and verification</span>

Device drivers are programs which allow software or higher-level computer programs to interact with a hardware device. These software components act as a link between the devices and the operating systems, communicating with each of these systems and executing commands. They provide an abstraction layer for the software above and also mediate the communication between the operating system kernel and the devices below.

American Fuzzy Lop (AFL), stylized in all lowercase as american fuzzy lop, is a free software fuzzer that employs genetic algorithms in order to efficiently increase code coverage of the test cases. So far it has detected dozens of significant software bugs in major free software projects, including X.Org Server, PHP, OpenSSL, pngcrush, bash, Firefox, BIND, Qt, and SQLite.

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.

Differential testing, also known as differential fuzzing, is a popular software testing technique that attempts to detect bugs, by providing the same input to a series of similar applications, and observing differences in their execution. Differential testing complements traditional software testing, because it is well-suited to find semantic or logic bugs that do not exhibit explicit erroneous behaviors like crashes or assertion failures. Differential testing is sometimes called back-to-back testing.

Runtime predictive analysis is a runtime verification technique in computer science for detecting property violations in program executions inferred from an observed execution. An important class of predictive analysis methods has been developed for detecting concurrency errors in concurrent programs, where a runtime monitor is used to predict errors which did not happen in the observed run, but can happen in an alternative execution of the same program. The predictive capability comes from the fact that the analysis is performed on an abstract model extracted online from the observed execution, which admits a class of executions beyond the observed one.

Multitier programming is a programming paradigm for distributed software, which typically follows a multitier architecture, physically separating different functional aspects of the software into different tiers. Multitier programming allows functionalities that span multiple of such tiers to be developed in a single compilation unit using a single programming language. Without multitier programming, tiers are developed using different languages, e.g., JavaScript for the Web client, PHP for the Web server and SQL for the database. Multitier programming is often integrated into general-purpose languages by extending them with support for distribution.

References

↑ Anand, Saswat; Patrice Godefroid; Nikolai Tillmann (2008). "Demand-Driven Compositional Symbolic Execution". Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science. Vol. 4963. pp. 367–381. doi:10.1007/978-3-540-78800-3_28. ISBN 978-3-540-78799-0.
↑ Ma, Kin-Keng; Khoo Yit Phang; Jeffrey S. Foster; Michael Hicks (2011). "Directed Symbolic Execution". Proceedings of the 18th International Conference on Statis Analysis. Springer. pp. 95–111. ISBN 9783642237010 . Retrieved 2013-04-03.
↑ Staats, Matt; Corina Pasareanu (2010). "Parallel symbolic execution for structural test generation". Proceedings of the 19th International Symposium on Software Testing and Analysis. pp. 183–194. doi:10.1145/1831708.1831732. ISBN 9781605588230. S2CID 9898522.
↑ Kuznetsov, Volodymyr; Kinder, Johannes; Bucur, Stefan; Candea, George (2012-01-01). "Efficient State Merging in Symbolic Execution". Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. New York, NY, USA: ACM. pp. 193–204. CiteSeerX 10.1.1.348.823 . doi:10.1145/2254064.2254088. ISBN 978-1-4503-1205-9. S2CID 135107.
↑ "Enhancing Symbolic Execution with Veritesting".
1 2 DeMillo, Rich; Offutt, Jeff (1991-09-01). "Constraint-Based Automatic Test Data Generation". IEEE Transactions on Software Engineering. 17 (9): 900–910. doi:10.1109/32.92910.
↑ Cadar, Cristian; Dunbar, Daniel; Engler, Dawson (2008-01-01). "KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs". Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation. OSDI'08: 209–224.
↑ Turpie, Jonathan; Reisner, Elnatan; Foster, Jeffrey; Hicks, Michael. "MultiOtter: Multiprocess Symbolic Execution" (PDF).
↑ Chipounov, Vitaly; Kuznetsov, Volodymyr; Candea, George (2012-02-01). "The S2E Platform: Design, Implementation, and Applications". ACM Trans. Comput. Syst. 30 (1): 2:1–2:49. doi:10.1145/2110356.2110358. ISSN 0734-2071. S2CID 16905399.
↑ Sharma, Asankhaya (2014). "Exploiting Undefined Behaviors for Efficient Symbolic Execution". ICSE Companion 2014: Companion Proceedings of the 36th International Conference on Software Engineering. pp. 727–729. doi:10.1145/2591062.2594450. ISBN 9781450327688. S2CID 10092664.
↑ Cadar, Cristian; Ganesh, Vijay; Pawlowski, Peter M.; Dill, David L.; Engler, Dawson R. (2008). "EXE: Automatically Generating Inputs of Death". ACM Trans. Inf. Syst. Secur. 12: 10:1–10:38. doi:10.1145/1455518.1455522. S2CID 10905673.
↑ Robert S. Boyer and Bernard Elspas and Karl N. Levitt SELECT--a formal system for testing and debugging programs by symbolic execution, Proceedings of the International Conference on Reliable Software, 1975, page 234--245, Los Angeles, California
↑ James C. King, Symbolic execution and program testing, Communications of the ACM, volume 19, number 7, 1976, 385--394
↑ William E. Howden, Experiments with a symbolic evaluation system, Proceedings, National Computer Conference, 1976.
↑ Lori A. Clarke, A program testing system, ACM 76: Proceedings of the Annual Conference, 1976, pages 488-491, Houston, Texas, United States

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Anand, Saswat; Patrice Godefroid; Nikolai Tillmann (2008). "Demand-Driven Compositional Symbolic Execution". Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science. Vol. 4963. pp. 367–381. doi:10.1007/978-3-540-78800-3_28. ISBN 978-3-540-78799-0.

[2] Ma, Kin-Keng; Khoo Yit Phang; Jeffrey S. Foster; Michael Hicks (2011). "Directed Symbolic Execution". Proceedings of the 18th International Conference on Statis Analysis. Springer. pp. 95–111. ISBN 9783642237010 . Retrieved 2013-04-03.

[3] Staats, Matt; Corina Pasareanu (2010). "Parallel symbolic execution for structural test generation". Proceedings of the 19th International Symposium on Software Testing and Analysis. pp. 183–194. doi:10.1145/1831708.1831732. ISBN 9781605588230. S2CID 9898522.

[4] Kuznetsov, Volodymyr; Kinder, Johannes; Bucur, Stefan; Candea, George (2012-01-01). "Efficient State Merging in Symbolic Execution". Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. New York, NY, USA: ACM. pp. 193–204. CiteSeerX 10.1.1.348.823 . doi:10.1145/2254064.2254088. ISBN 978-1-4503-1205-9. S2CID 135107.

[5] "Enhancing Symbolic Execution with Veritesting".

[DeMillo1991-6] 1 2 DeMillo, Rich; Offutt, Jeff (1991-09-01). "Constraint-Based Automatic Test Data Generation". IEEE Transactions on Software Engineering. 17 (9): 900–910. doi:10.1109/32.92910.

[7] Cadar, Cristian; Dunbar, Daniel; Engler, Dawson (2008-01-01). "KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs". Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation. OSDI'08: 209–224.

[8] Turpie, Jonathan; Reisner, Elnatan; Foster, Jeffrey; Hicks, Michael. "MultiOtter: Multiprocess Symbolic Execution" (PDF).

[9] Chipounov, Vitaly; Kuznetsov, Volodymyr; Candea, George (2012-02-01). "The S2E Platform: Design, Implementation, and Applications". ACM Trans. Comput. Syst. 30 (1): 2:1–2:49. doi:10.1145/2110356.2110358. ISSN 0734-2071. S2CID 16905399.

[10] Sharma, Asankhaya (2014). "Exploiting Undefined Behaviors for Efficient Symbolic Execution". ICSE Companion 2014: Companion Proceedings of the 36th International Conference on Software Engineering. pp. 727–729. doi:10.1145/2591062.2594450. ISBN 9781450327688. S2CID 10092664.

[11] Cadar, Cristian; Ganesh, Vijay; Pawlowski, Peter M.; Dill, David L.; Engler, Dawson R. (2008). "EXE: Automatically Generating Inputs of Death". ACM Trans. Inf. Syst. Secur. 12: 10:1–10:38. doi:10.1145/1455518.1455522. S2CID 10905673.

[12] Robert S. Boyer and Bernard Elspas and Karl N. Levitt SELECT--a formal system for testing and debugging programs by symbolic execution, Proceedings of the International Conference on Reliable Software, 1975, page 234--245, Los Angeles, California

[13] James C. King, Symbolic execution and program testing, Communications of the ACM, volume 19, number 7, 1976, 385--394

[14] William E. Howden, Experiments with a symbolic evaluation system, Proceedings, National Computer Conference, 1976.

[15] Lori A. Clarke, A program testing system, ACM 76: Proceedings of the Annual Conference, 1976, pages 488-491, Houston, Texas, United States

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

v t e Software testing
The "box" approach	Black-box testing All-pairs testing Exploratory testing Fuzz testing Model-based testing Scenario testing Grey-box testing White-box testing API testing Mutation testing Static testing
Testing levels	Acceptance testing Integration testing System testing Unit testing
Testing types, techniques, and tactics	A/B testing Benchmark Compatibility testing Concolic testing Concurrent testing Conformance testing Continuous testing Destructive testing Development testing Differential testing Dynamic program analysis Installation testing Negative testing Random testing Regression testing Security testing Smoke testing (software) Software performance testing Stress testing Symbolic execution Test automation Usability testing
See also	Graphical user interface testing Manual testing Orthogonal array testing Pair testing Soak testing Software reliability testing Stress testing Web testing