Symbolic execution

Last updated

In computer science, symbolic execution (also symbolic evaluation or symbex) is a means of analyzing a program to determine what inputs cause each part of a program to execute. An interpreter follows the program, assuming symbolic values for inputs rather than obtaining actual inputs as normal execution of the program would. It thus arrives at expressions in terms of those symbols for expressions and variables in the program, and constraints in terms of those symbols for the possible outcomes of each conditional branch. Finally, the possible inputs that trigger a branch can be determined by solving the constraints.

Contents

The field of symbolic simulation applies the same concept to hardware. Symbolic computation applies the concept to the analysis of mathematical expressions.

Example

Consider the program below, which reads in a value and fails if the input is 6.

intf(){...y=read();z=y*2;if(z==12){fail();}else{printf("OK");}}

During a normal execution ("concrete" execution), the program would read a concrete input value (e.g., 5) and assign it to y. Execution would then proceed with the multiplication and the conditional branch, which would evaluate to false and print OK.

During symbolic execution, the program reads a symbolic value (e.g., λ) and assigns it to y. The program would then proceed with the multiplication and assign λ * 2 to z. When reaching the if statement, it would evaluate λ * 2 == 12. At this point of the program, λ could take any value, and symbolic execution can therefore proceed along both branches, by "forking" two paths. Each path gets assigned a copy of the program state at the branch instruction as well as a path constraint. In this example, the path constraint is λ * 2 == 12 for the if branch and λ * 2 != 12 for the else branch. Both paths can be symbolically executed independently. When paths terminate (e.g., as a result of executing fail() or simply exiting), symbolic execution computes a concrete value for λ by solving the accumulated path constraints on each path. These concrete values can be thought of as concrete test cases that can, e.g., help developers reproduce bugs. In this example, the constraint solver would determine that in order to reach the fail() statement, λ would need to equal 6.

Limitations

Path explosion

Symbolically executing all feasible program paths does not scale to large programs. The number of feasible paths in a program grows exponentially with an increase in program size and can even be infinite in the case of programs with unbounded loop iterations. [1] Solutions to the path explosion problem generally use either heuristics for path-finding to increase code coverage, [2] reduce execution time by parallelizing independent paths, [3] or by merging similar paths. [4] One example of merging is veritesting, which "employs static symbolic execution to amplify the effect of dynamic symbolic execution". [5]

Program-dependent efficiency

Symbolic execution is used to reason about a program path-by-path which is an advantage over reasoning about a program input-by-input as other testing paradigms use (e.g. dynamic program analysis). However, if few inputs take the same path through the program, there is little savings over testing each of the inputs separately.

Memory aliasing

Symbolic execution is harder when the same memory location can be accessed through different names (aliasing). Aliasing cannot always be recognized statically, so the symbolic execution engine can't recognize that a change to the value of one variable also changes the other. [6]

Arrays

Since an array is a collection of many distinct values, symbolic executors must either treat the entire array as one value or treat each array element as a separate location. The problem with treating each array element separately is that a reference such as "A[i]" can only be specified dynamically, when the value for i has a concrete value. [6]

Environment interactions

Programs interact with their environment by performing system calls, receiving signals, etc. Consistency problems may arise when execution reaches components that are not under control of the symbolic execution tool (e.g., kernel or libraries). Consider the following example:

intmain(){FILE*fp=fopen("doc.txt");...if(condition){fputs("some data",fp);}else{fputs("some other data",fp);}...data=fgets(...,fp);}

This program opens a file and, based on some condition, writes different kind of data to the file. It then later reads back the written data. In theory, symbolic execution would fork two paths at line 5 and each path from there on would have its own copy of the file. The statement at line 11 would therefore return data that is consistent with the value of "condition" at line 5. In practice, file operations are implemented as system calls in the kernel, and are outside the control of the symbolic execution tool. The main approaches to address this challenge are:

Executing calls to the environment directly. The advantage of this approach is that it is simple to implement. The disadvantage is that the side effects of such calls will clobber all states managed by the symbolic execution engine. In the example above, the instruction at line 11 would return "some datasome other data" or "some other datasomedata" depending on the sequential ordering of the states.

Modeling the environment. In this case, the engine instruments the system calls with a model that simulates their effects and that keeps all the side effects in per-state storage. The advantage is that one would get correct results when symbolically executing programs that interact with the environment. The disadvantage is that one needs to implement and maintain many potentially complex models of system calls. Tools such as KLEE, [7] Cloud9, and Otter [8] take this approach by implementing models for file system operations, sockets, IPC, etc.

Forking the entire system state. Symbolic execution tools based on virtual machines solve the environment problem by forking the entire VM state. For example, in S2E [9] each state is an independent VM snapshot that can be executed separately. This approach alleviates the need for writing and maintaining complex models and allows virtually any program binary to be executed symbolically. However, it has higher memory usage overheads (VM snapshots may be large).

Tools

ToolTargetURLCan anybody use it/ Open source/ Downloadable
angrlibVEX based (supporting x86, x86-64, ARM, AARCH64, MIPS, MIPS64, PPC, PPC64, and Java) http://angr.io/ yes
BE-PUMx86 https://github.com/NMHai/BE-PUM yes
BINSECx86, ARM, RISC-V (32 bits) http://binsec.github.io yes
crucibleLLVM, JVM, etc https://github.com/GaloisInc/crucible yes
ExpoSE JavaScript https://github.com/ExpoSEJS/ExpoSE yes
FuzzBALLVineIL / Native http://bitblaze.cs.berkeley.edu/fuzzball.html yes
GenSymLLVM https://github.com/Generative-Program-Analysis/GenSym yes
Jalangi2 JavaScript https://github.com/Samsung/jalangi2 yes
janala2Java https://github.com/ksen007/janala2 yes
JaVerT JavaScript https://www.doc.ic.ac.uk/~pg/publications/FragosoSantos2019JaVerT.pdf yes
JBSEJava https://github.com/pietrobraione/jbse yes
jCUTEJava https://github.com/osl/jcute yes
KeYJava http://www.key-project.org/ yes
KiteLLVM http://www.cs.ubc.ca/labs/isd/Projects/Kite/ yes
KLEELLVM https://klee.github.io/ yes
Kudzu JavaScript http://webblaze.cs.berkeley.edu/2010/kudzu/kudzu.pdf no
MPro Ethereum Virtual Machine (EVM) / Native https://sites.google.com/view/smartcontract-analysis/home yes
MaatGhidra P-code / SLEIGH https://maat.re/ yes
Manticorex86-64, ARMv7, Ethereum Virtual Machine (EVM) / Native https://github.com/trailofbits/manticore/ yes
MayhemBinary http://forallsecure.com no
Mythril Ethereum Virtual Machine (EVM) / Native https://github.com/ConsenSys/mythril yes
OtterC https://bitbucket.org/khooyp/otter/overview yes
Oyente-NG Ethereum Virtual Machine (EVM) / Native http://www.comp.ita.br/labsca/waiaf/papers/RafaelShigemura_paper_16.pdf no
Pathgrind [10] Native 32-bit Valgrind-based https://github.com/codelion/pathgrind yes
Pex .NET Framework http://research.microsoft.com/en-us/projects/pex/ no
pysymemux86-64 / Native https://github.com/feliam/pysymemu/ yes
RosetteDialect of Racket https://emina.github.io/rosette/ yes
Rubyx Ruby http://www.cs.umd.edu/~avik/papers/ssarorwa.pdf no
S2Ex86, x86-64, ARM / User and kernel-mode binaries http://s2e.systems/ yes
Symbolic PathFinder (SPF)Java Bytecode https://github.com/SymbolicPathFinder yes
SymDroid Dalvik bytecode http://www.cs.umd.edu/~jfoster/papers/symdroid.pdf no
SymJS JavaScript https://core.ac.uk/download/pdf/24067593.pdf no
SymCCLLVM https://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html yes
Tritonx86, x86-64, ARM and AArch64 https://triton.quarkslab.com yes
VerifastC, Java https://people.cs.kuleuven.be/~bart.jacobs/verifast yes

Earlier versions of the tools

  1. EXE [11] is an earlier version of KLEE. The EXE paper can be found here.

History

The concept of symbolic execution was introduced academically in the 1970s with descriptions of: the Select system, [12] the EFFIGY system, [13] the DISSECT system, [14] and Clarke's system. [15]

See also

Related Research Articles

In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a user of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This mathematical model contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user. For example, a stack has push/pop operations that follow a Last-In-First-Out rule, and can be concretely implemented using either a list or an array. Another example is a set which stores values, without any particular order, and no repeated values. Values themselves are not retrieved from sets, rather one tests a value for membership to obtain a Boolean "in" or "not in".

In software engineering, code coverage is a percentage measure of the degree to which the source code of a program is executed when a particular test suite is run. A program with high test coverage has more of its source code executed during testing, which suggests it has a lower chance of containing undetected software bugs compared to a program with low test coverage. Many different metrics can be used to calculate test coverage. Some of the most basic are the percentage of program subroutines and the percentage of program statements called during execution of the test suite.

Software testing is the act of examining the artifacts and the behavior of the software under test by verification and validation. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation. Test techniques include, but are not limited to:

<span class="mw-page-title-main">Model-based testing</span>

Model-based testing is an application of model-based design for designing and optionally also executing artifacts to perform software testing or system testing. Models can be used to represent the desired behavior of a system under test (SUT), or to represent testing strategies and a test environment. The picture on the right depicts the former approach.

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

Java Pathfinder (JPF) is a system to verify executable Java bytecode programs. JPF was developed at the NASA Ames Research Center and open sourced in 2005. The acronym JPF is not to be confused with the unrelated Java Plugin Framework project.

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

Runtime verification is a computing system analysis and execution approach based on extracting information from a running system and using it to detect and possibly react to observed behaviors satisfying or violating certain properties. Some very particular properties, such as datarace and deadlock freedom, are typically desired to be satisfied by all systems and may be best implemented algorithmically. Other properties can be more conveniently captured as formal specifications. Runtime verification specifications are typically expressed in trace predicate formalisms, such as finite state machines, regular expressions, context-free patterns, linear temporal logics, etc., or extensions of these. This allows for a less ad-hoc approach than normal testing. However, any mechanism for monitoring an executing system is considered runtime verification, including verifying against test oracles and reference implementations. When formal requirements specifications are provided, monitors are synthesized from them and infused within the system by means of instrumentation. Runtime verification can be used for many purposes, such as security or safety policy monitoring, debugging, testing, verification, validation, profiling, fault protection, behavior modification, etc. Runtime verification avoids the complexity of traditional formal verification techniques, such as model checking and theorem proving, by analyzing only one or a few execution traces and by working directly with the actual system, thus scaling up relatively well and giving more confidence in the results of the analysis, at the expense of less coverage. Moreover, through its reflective capabilities runtime verification can be made an integral part of the target system, monitoring and guiding its execution during deployment.

Dynamic program analysis is analysis of computer software that involves executing the program in question. Dynamic program analysis includes familiar techniques from software engineering such as unit testing, debugging, and measuring code coverage, but also includes lesser-known techniques like program slicing and invariant inference. Dynamic program analysis is widely applied in security in the form of runtime memory error detection, fuzzing, dynamic symbolic execution, and taint tracking.

Thread Level Speculation (TLS), also known as Speculative Multi-threading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.

<span class="mw-page-title-main">Incremental computing</span> Software feature

Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is successful, it can be significantly faster than computing new outputs naively. For example, a spreadsheet software package might use incremental computation in its recalculation feature, to update only those cells containing formulas which depend on the changed cells.

Return-oriented programming (ROP) is a computer security exploit technique that allows an attacker to execute code in the presence of security defenses such as executable space protection and code signing.

In computer programming and software development, debugging is the process of finding and resolving bugs within computer programs, software, or systems.

Concolic testing is a hybrid software verification technique that performs symbolic execution, a classical technique that treats program variables as symbolic variables, along a concrete execution path. Symbolic execution is used in conjunction with an automated theorem prover or constraint solver based on constraint logic programming to generate new concrete inputs with the aim of maximizing code coverage. Its main focus is finding bugs in real-world software, rather than demonstrating program correctness.

<span class="mw-page-title-main">Device driver synthesis and verification</span>

Device drivers are programs which allow software or higher-level computer programs to interact with a hardware device. These software components act as a link between the devices and the operating systems, communicating with each of these systems and executing commands. They provide an abstraction layer for the software above and also mediate the communication between the operating system kernel and the devices below.

<span class="mw-page-title-main">American Fuzzy Lop (software)</span> Software fuzzer that employs genetic algorithms

American Fuzzy Lop (AFL), stylized in all lowercase as american fuzzy lop, is a free software fuzzer that employs genetic algorithms in order to efficiently increase code coverage of the test cases. So far it has detected dozens of significant software bugs in major free software projects, including X.Org Server, PHP, OpenSSL, pngcrush, bash, Firefox, BIND, Qt, and SQLite.

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.

Differential testing, also known as differential fuzzing, is a popular software testing technique that attempts to detect bugs, by providing the same input to a series of similar applications, and observing differences in their execution. Differential testing complements traditional software testing, because it is well-suited to find semantic or logic bugs that do not exhibit explicit erroneous behaviors like crashes or assertion failures. Differential testing is sometimes called back-to-back testing.

Runtime predictive analysis is a runtime verification technique in computer science for detecting property violations in program executions inferred from an observed execution. An important class of predictive analysis methods has been developed for detecting concurrency errors in concurrent programs, where a runtime monitor is used to predict errors which did not happen in the observed run, but can happen in an alternative execution of the same program. The predictive capability comes from the fact that the analysis is performed on an abstract model extracted online from the observed execution, which admits a class of executions beyond the observed one.

Multitier programming is a programming paradigm for distributed software, which typically follows a multitier architecture, physically separating different functional aspects of the software into different tiers. Multitier programming allows functionalities that span multiple of such tiers to be developed in a single compilation unit using a single programming language. Without multitier programming, tiers are developed using different languages, e.g., JavaScript for the Web client, PHP for the Web server and SQL for the database. Multitier programming is often integrated into general-purpose languages by extending them with support for distribution.

References

  1. Anand, Saswat; Patrice Godefroid; Nikolai Tillmann (2008). "Demand-Driven Compositional Symbolic Execution". Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science. Vol. 4963. pp. 367–381. doi:10.1007/978-3-540-78800-3_28. ISBN   978-3-540-78799-0.
  2. Ma, Kin-Keng; Khoo Yit Phang; Jeffrey S. Foster; Michael Hicks (2011). "Directed Symbolic Execution". Proceedings of the 18th International Conference on Statis Analysis. Springer. pp. 95–111. ISBN   9783642237010 . Retrieved 2013-04-03.
  3. Staats, Matt; Corina Pasareanu (2010). "Parallel symbolic execution for structural test generation". Proceedings of the 19th International Symposium on Software Testing and Analysis. pp. 183–194. doi:10.1145/1831708.1831732. ISBN   9781605588230. S2CID   9898522.
  4. Kuznetsov, Volodymyr; Kinder, Johannes; Bucur, Stefan; Candea, George (2012-01-01). "Efficient State Merging in Symbolic Execution". Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. New York, NY, USA: ACM. pp. 193–204. CiteSeerX   10.1.1.348.823 . doi:10.1145/2254064.2254088. ISBN   978-1-4503-1205-9. S2CID   135107.
  5. "Enhancing Symbolic Execution with Veritesting".
  6. 1 2 DeMillo, Rich; Offutt, Jeff (1991-09-01). "Constraint-Based Automatic Test Data Generation". IEEE Transactions on Software Engineering. 17 (9): 900–910. doi:10.1109/32.92910.
  7. Cadar, Cristian; Dunbar, Daniel; Engler, Dawson (2008-01-01). "KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs". Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation. OSDI'08: 209–224.
  8. Turpie, Jonathan; Reisner, Elnatan; Foster, Jeffrey; Hicks, Michael. "MultiOtter: Multiprocess Symbolic Execution" (PDF).
  9. Chipounov, Vitaly; Kuznetsov, Volodymyr; Candea, George (2012-02-01). "The S2E Platform: Design, Implementation, and Applications". ACM Trans. Comput. Syst. 30 (1): 2:1–2:49. doi:10.1145/2110356.2110358. ISSN   0734-2071. S2CID   16905399.
  10. Sharma, Asankhaya (2014). "Exploiting Undefined Behaviors for Efficient Symbolic Execution". ICSE Companion 2014: Companion Proceedings of the 36th International Conference on Software Engineering. pp. 727–729. doi:10.1145/2591062.2594450. ISBN   9781450327688. S2CID   10092664.
  11. Cadar, Cristian; Ganesh, Vijay; Pawlowski, Peter M.; Dill, David L.; Engler, Dawson R. (2008). "EXE: Automatically Generating Inputs of Death". ACM Trans. Inf. Syst. Secur. 12: 10:1–10:38. doi:10.1145/1455518.1455522. S2CID   10905673.
  12. Robert S. Boyer and Bernard Elspas and Karl N. Levitt SELECT--a formal system for testing and debugging programs by symbolic execution, Proceedings of the International Conference on Reliable Software, 1975, page 234--245, Los Angeles, California
  13. James C. King, Symbolic execution and program testing, Communications of the ACM, volume 19, number 7, 1976, 385--394
  14. William E. Howden, Experiments with a symbolic evaluation system, Proceedings, National Computer Conference, 1976.
  15. Lori A. Clarke, A program testing system, ACM 76: Proceedings of the Annual Conference, 1976, pages 488-491, Houston, Texas, United States