Program analysis

Last updated

In computer science, program analysis [1] is the process of automatically analyzing the behavior of computer programs regarding a property such as correctness, robustness, safety and liveness. Program analysis focuses on two major areas: program optimization and program correctness. The first focuses on improving the program’s performance while reducing the resource usage while the latter focuses on ensuring that the program does what it is supposed to do.

Contents

Program analysis can be performed without executing the program (static program analysis), during runtime (dynamic program analysis) or in a combination of both.

Static program analysis

In the context of program correctness, static analysis can discover vulnerabilities during the development phase of the program. [2] These vulnerabilities are easier to correct than the ones found during the testing phase since static analysis leads to the root of the vulnerability.

Due to many forms of static analysis being computationally undecidable, the mechanisms for doing it will not always terminate with the right answer  either because they sometimes return a false negative ("no problems found" when the code does in fact have problems) or a false positive, or because they never return the wrong answer but sometimes never terminate. Despite their limitations, the first type of mechanism might reduce the number of vulnerabilities, while the second can sometimes give strong assurance of the lack of a certain class of vulnerabilities.

Incorrect optimizations are highly undesirable. So, in the context of program optimization, there are two main strategies to handle computationally undecidable analysis:

  1. An optimizer that is expected to complete in a relatively short amount of time, such as the optimizer in an optimizing compiler, may use a truncated version of an analysis that is guaranteed to complete in a finite amount of time, and guaranteed to only find correct optimizations.
  2. A third-party optimization tool may be implemented in such a way as to never produce an incorrect optimization, but also so that it can, in some situations, continue running indefinitely until it finds one (which may never happen). In this case, the developer using the tool would have to stop the tool and avoid running the tool on that piece of code again (or possibly modify the code to avoid tripping up the tool).

However, there is also a third strategy that is sometimes applicable for languages that are not completely specified, such as C. An optimizing compiler is at liberty to generate code that does anything at runtime  even crashes  if it encounters source code whose semantics are unspecified by the language standard in use.

Control-flow

The purpose of control-flow analysis is to obtain information about which functions can be called at various points during the execution of a program. The collected information is represented by a control-flow graph (CFG) where the nodes are instructions of the program and the edges represent the flow of control. By identifying code blocks and loops a CFG becomes a starting point for compiler-made optimizations.

Data-flow analysis

Data-flow analysis is a technique designed to gather information about the values at each point of the program and how they change over time. This technique is often used by compilers to optimize the code. One of the most well known examples of data-flow analysis is taint checking, which consists of considering all variables that contain user-supplied data  which is considered "tainted", i.e. insecure  and preventing those variables from being used until they have been sanitized. This technique is often used to prevent SQL injection attacks. Taint checking can be done statically or dynamically.

Abstract interpretation

Abstract interpretation allows the extraction of information about a possible execution of a program without actually executing the program. This information can be used by compilers to look for possible optimizations or for certifying a program against certain classes of bugs.

Type systems

Type systems associate types to programs that fulfill certain requirements. Their purpose is to select a subset of programs of a language that are considered correct according to a property.

Type checking is used in programming to limit how programming objects are used and what can they do. This is done by the compiler or interpreter. Type checking can also help prevent vulnerabilities by ensuring that a signed value isn't attributed to an unsigned variable. Type checking can be done statically (at compile time), dynamically (at runtime) or a combination of both.

Static type information (either inferred, or explicitly provided by type annotations in the source code) can also be used to do optimizations, such as replacing boxed arrays with unboxed arrays.

Effect systems

Effect systems are formal systems designed to represent the effects that executing a function or method can have. An effect codifies what is being done and with what it is being done  usually referred to as effect kind and effect region, respectively.[ clarification needed ]

Model checking

Model checking refers to strict, formal, and automated ways to check if a model (which in this context means a formal model of a piece of code, though in other contexts it can be a model of a piece of hardware) complies with a given specification. Due to the inherent finite-state nature of code, and both the specification and the code being convertible into logical formulae, it is possible to check if the system violates the specification using efficient algorithmic methods.

Dynamic program analysis

Dynamic analysis can use runtime knowledge of the program to increase the precision of the analysis, while also providing runtime protection, but it can only analyze a single execution of the problem and might degrade the program’s performance due to the runtime checks.

Testing

Software should be tested to ensure its quality and that it performs as it is supposed to in a reliable manner, and that it won’t create conflicts with other software that may function alongside it. The tests are performed by executing the program with an input and evaluating its behavior and the produced output. Even if no security requirements are specified, additional security testing should be performed to ensure that an attacker can’t tamper with the software and steal information, disrupt the software’s normal operations, or use it as a pivot to attack its users.

Monitoring

Program monitoring records and logs different kinds of information about the program such as resource usage, events, and interactions, so that it can be reviewed to find or pinpoint causes of abnormal behavior. Furthermore, it can be used to perform security audits. Automated monitoring of programs is sometimes referred to as runtime verification.

Program slicing

For a given subset of a program’s behavior, program slicing consists of reducing the program to the minimum form that still produces the selected behavior. The reduced program is called a “slice” and is a faithful representation of the original program within the domain of the specified behavior subset. Generally, finding a slice is an unsolvable problem, but by specifying the target behavior subset by the values of a set of variables, it is possible to obtain approximate slices using a data-flow algorithm. These slices are usually used by developers during debugging to locate the source of errors.

See also

Related Research Articles

In computer science, static program analysis is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs during their execution.

OCaml is a general-purpose, high-level multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.

In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power consumption.

<span class="mw-page-title-main">Library (computing)</span> Collection of non-volatile resources used by computer programs

In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subroutines, classes, values or type specifications. In IBM's OS/360 and its successors they are referred to as partitioned data sets.

In computer science, compile time describes the time window during which a language's statements are converted into 0s and 1s for the computer to understand. The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concepts related to the context of program execution (runtime). For example, compile-time requirements are programming language requirements that must be met by source code before compilation and compile-time properties are properties of the program that can be reasoned about during compilation. The actual length of time it takes to compile a program is usually referred to as compilation time.

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term". Usually the terms are various constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term. Type systems formalize and enforce the otherwise implicit categories the programmer uses for algebraic data types, data structures, or other components.

In computing, just-in-time (JIT) compilation is a way of executing computer code that involves compilation during execution of a program rather than before execution. This may consist of source code translation but is more commonly bytecode translation to machine code, which is then executed directly. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation or recompilation would outweigh the overhead of compiling that code.

In compiler theory, dead-code elimination is a compiler optimization to remove code which does not affect the program results. Removing such code has several benefits: it shrinks program size, an important consideration in some contexts, and it allows the running program to avoid executing irrelevant operations, which reduces its running time. It can also enable further optimizations by simplifying program structure. Dead code includes code that can never be executed, and code that only affects dead variables, that is, irrelevant to the program.

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the platform.

In computer programming, unreachable code is part of the source code of a program which can never be executed because there exists no control flow path to the code from the rest of the program.

Runtime verification is a computing system analysis and execution approach based on extracting information from a running system and using it to detect and possibly react to observed behaviors satisfying or violating certain properties. Some very particular properties, such as datarace and deadlock freedom, are typically desired to be satisfied by all systems and may be best implemented algorithmically. Other properties can be more conveniently captured as formal specifications. Runtime verification specifications are typically expressed in trace predicate formalisms, such as finite state machines, regular expressions, context-free patterns, linear temporal logics, etc., or extensions of these. This allows for a less ad-hoc approach than normal testing. However, any mechanism for monitoring an executing system is considered runtime verification, including verifying against test oracles and reference implementations. When formal requirements specifications are provided, monitors are synthesized from them and infused within the system by means of instrumentation. Runtime verification can be used for many purposes, such as security or safety policy monitoring, debugging, testing, verification, validation, profiling, fault protection, behavior modification, etc. Runtime verification avoids the complexity of traditional formal verification techniques, such as model checking and theorem proving, by analyzing only one or a few execution traces and by working directly with the actual system, thus scaling up relatively well and giving more confidence in the results of the analysis, at the expense of less coverage. Moreover, through its reflective capabilities runtime verification can be made an integral part of the target system, monitoring and guiding its execution during deployment.

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

Taint checking is a feature in some computer programming languages, such as Perl, Ruby or Ballerina designed to increase security by preventing malicious users from executing commands on a host computer. Taint checks highlight specific security risks primarily associated with web sites which are attacked using techniques such as SQL injection or buffer overflow attack approaches.

Dynamic program analysis is analysis of computer software that involves executing the program in question. Dynamic program analysis includes familiar techniques from software engineering such as unit testing, debugging, and measuring code coverage, but also includes lesser-known techniques like program slicing and invariant inference. Dynamic program analysis is widely applied in security in the form of runtime memory error detection, fuzzing, dynamic symbolic execution, and taint tracking.

Dynamic testing is a term used in software engineering to describe the testing of the dynamic behavior of code.

For several years parallel hardware was only available for distributed computing but recently it is becoming available for the low end computers as well. Hence it has become inevitable for software programmers to start writing parallel applications. It is quite natural for programmers to think sequentially and hence they are less acquainted with writing multi-threaded or parallel processing applications. Parallel programming requires handling various issues such as synchronization and deadlock avoidance. Programmers require added expertise for writing such applications apart from their expertise in the application domain. Hence programmers prefer to write sequential code and most of the popular programming languages support it. This allows them to concentrate more on the application. Therefore, there is a need to convert such sequential applications to parallel applications with the help of automated tools. The need is also non-trivial because large amount of legacy code written over the past few decades needs to be reused and parallelized.

<span class="mw-page-title-main">Parasoft C/C++test</span> Integrated set of tools

Parasoft C/C++test is an integrated set of tools for testing C and C++ source code that software developers use to analyze, test, find defects, and measure the quality and security of their applications. It supports software development practices that are part of development testing, including static code analysis, dynamic code analysis, unit test case generation and execution, code coverage analysis, regression testing, runtime error detection, requirements traceability, and code review. It's a commercial tool that supports operation on Linux, Windows, and Solaris platforms as well as support for on-target embedded testing and cross compilers.

In computer science, language-based security (LBS) is a set of techniques that may be used to strengthen the security of applications on a high level by using the properties of programming languages. LBS is considered to enforce computer security on an application-level, making it possible to prevent vulnerabilities which traditional operating system security is unable to handle.

<span class="mw-page-title-main">StaDyn (programming language)</span> Language for the .NET platform

StaDyn is an object-oriented general-purpose programming language for the .NET platform that supports both static and dynamic typing in the same programming language.

References

  1. Nielson, F., Nielson, H. R., & Hankin, C. (2015). Principles of program analysis. Springer.
  2. Jovanovic, N., Kruegel, C., & Kirda, E. (2006, May). Pixy: A static analysis tool for detecting web application vulnerabilities. In Security and Privacy, 2006 IEEE Symposium on (pp. 6-pp). IEEE.

Further reading