Program slicing

Last updated

In computer programming, program slicing is the computation of the set of program statements, the program slice, that may affect the values at some point of interest, referred to as a slicing criterion. Program slicing can be used in debugging to locate source of errors more easily. Other applications of slicing include software maintenance, optimization, program analysis, and information flow control.

Contents

Slicing techniques have been seeing a rapid development since the original definition by Mark Weiser. At first, slicing was only static, i.e., applied on the source code with no other information than the source code. Bogdan Korel and Janusz Laski introduced dynamic slicing, which works on a specific execution of the program (for a given execution trace). [1] Other forms of slicing exist, for instance path slicing. [2]

Static slicing

Based on the original definition of Weiser, [3] informally, a static program slice S consists of all statements in program P that may affect the value of variable v in a statement x. The slice is defined for a slicing criterion C=(x,v) where x is a statement in program P and v is variable in x. A static slice includes all the statements that can affect the value of variable v at statement x for any possible input. Static slices are computed by backtracking dependencies between statements. More specifically, to compute the static slice for (x,v), we first find all statements that can directly affect the value of v before statement x is encountered. Recursively, for each statement y which can affect the value of v in statement x, we compute the slices for all variables z in y that affect the value of v. The union of all those slices is the static slice for (x,v).

Example

For example, consider the C program below. Let's compute the slice for ( write(sum), sum ). The value of sum is directly affected by the statements "sum = sum + i + w" if N>1 and "int sum = 0" if N <= 1. So, slice( write(sum), sum) is the union of three slices and the "int sum = 0" statement which has no dependencies:

  1. slice( sum = sum + i + w, sum)
  2. ,
  3. slice( sum = sum + i + w, i)
  4. ,
  5. slice( sum = sum + i + w, w)
  6. , and
  7. { int sum=0 }.

It is fairly easy to see that slice( sum = sum + i + w, sum) consists of "sum = sum + i + w" and "int sum = 0" because those are the only two prior statements that can affect the value of sum at "sum = sum + i + w". Similarly, slice( sum = sum + i + w, i) only contains "for(i = 1; i < N; ++i) {" and slice( sum = sum + i + w, w) only contains the statement "int w = 7".

When we union all of those statements, we do not have executable code, so to make the slice an executable slice we merely add the end brace for the for loop and the declaration of i. The resulting static executable slice is shown below the original code below.

inti;intsum=0;intproduct=1;intw=7;for(i=1;i<N;++i){sum=sum+i+w;product=product*i;}write(sum);write(product);

The static executable slice for criteria (write(sum), sum) is the new program shown below.

inti;intsum=0;intw=7;for(i=1;i<N;++i){sum=sum+i+w;}write(sum);

In fact, most static slicing techniques, including Weiser's own technique, will also remove the write(sum) statement. Since, at the statement write(sum), the value of sum is not dependent on the statement itself. Often a slice for a particular statement x will include more than one variable. If V is a set of variables in a statement x, then the slice for (x, V) is the union of all slices with criteria (x, v) where v is a variable in the set V.

Lightweight forward static slicing approach

A very fast and scalable, yet slightly less accurate, slicing approach is extremely useful for a number of reasons. Developers will have a very low cost and practical means to estimate the impact of a change within minutes versus days. This is very important for planning the implementation of new features and understanding how a change is related to other parts of the system. It will also provide an inexpensive test to determine if a full, more expensive, analysis of the system is warranted. A fast slicing approach will open up new avenues of research in metrics and the mining of histories based on slicing. That is, slicing can now be conducted on very large systems and on entire version histories in very practical time frames. This opens the door to a number of experiments and empirical investigations previously too costly to undertake. [4]

Dynamic slicing

Dynamic slicing makes use of information about a particular execution of a program. A dynamic slice contains all statements that actually affect the value of a variable at a program point for a particular execution of the program rather than all statements that may have affected the value of a variable at a program point for any arbitrary execution of the program.

An example to clarify the difference between static and dynamic slicing. Consider a small piece of a program unit, in which there is an iteration block containing an if-else block. There are a few statements in both the if and else blocks that have an effect on a variable. In the case of static slicing, since the whole program unit is looked at irrespective of a particular execution of the program, the affected statements in both blocks would be included in the slice. But, in the case of dynamic slicing we consider a particular execution of the program, wherein the if block gets executed and the affected statements in the else block do not get executed. So, that is why in this particular execution case, the dynamic slice would contain only the statements in the if block.

See also

Notes

  1. Korel, Bogdan; Laski, Janusz (1988). "Dynamic Program Slicing". Information Processing Letters. 29 (3): 155–163. CiteSeerX   10.1.1.158.9078 . doi:10.1016/0020-0190(88)90054-3.
  2. Jhala, Ranjit; Majumdar, Rupak (2005). "Path slicing". Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation. PLDI '05. New York, NY, USA: ACM. pp. 38–47. doi:10.1145/1065010.1065016. ISBN   9781595930569. S2CID   5065847.
  3. Weiser, Mark David (1979). Program Slices: Formal, Psychological, and Practical Investigations of an Automatic Program Abstraction Method (PhD Thesis thesis). Ann Arbor, MI, USA: University of Michigan.
  4. Alomari, Hakam W.; Collard, Michael L.; Maletic, Jonathan I.; Alhindawi, Nouh; Meqdadi, Omar (2014-05-19). "srcSlice: very efficient and scalable forward static slicing". Journal of Software: Evolution and Process. 26 (11): 931–961. CiteSeerX   10.1.1.641.8891 . doi:10.1002/smr.1651. ISSN   2047-7473. S2CID   18520643.

Related Research Articles

<span class="mw-page-title-main">Computer program</span> Instructions to be executed by a computer

A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components.

<span class="mw-page-title-main">Programming language</span> Language for communicating instructions to a machine

A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.

In computer programming, the scope of a name binding is the part of a program where the name binding is valid; that is, where the name can be used to refer to the entity. In other parts of the program, the name may refer to a different entity, or to nothing at all. Scope helps prevent name collisions by allowing the same name to refer to different objects – as long as the names have separate scopes. The scope of a name binding is also known as the visibility of an entity, particularly in older or more technical literature—this is from the perspective of the referenced entity, not the referencing name.

In computing, a computer program or subroutine is called reentrant if multiple invocations can safely run concurrently on multiple processors, or on a single-processor system, where a reentrant procedure can be interrupted in the middle of its execution and then safely be called again ("re-entered") before its previous invocations complete execution. The interruption could be caused by an internal action such as a jump or call, or by an external action such as an interrupt or signal, unlike recursion, where new invocations can only be caused by internal call.

In computer science, program analysis is the process of automatically analyzing the behavior of computer programs regarding a property such as correctness, robustness, safety and liveness. Program analysis focuses on two major areas: program optimization and program correctness. The first focuses on improving the program’s performance while reducing the resource usage while the latter focuses on ensuring that the program does what it is supposed to do.

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term". Usually the terms are various constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term. Type systems formalize and enforce the otherwise implicit categories the programmer uses for algebraic data types, data structures, or other components.

In computer programming, specifically when using the imperative programming paradigm, an assertion is a predicate connected to a point in the program, that always should evaluate to true at that point in code execution. Assertions can help a programmer read the code, help a compiler compile it, or help the program detect its own defects.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In compiler theory, dead-code elimination is a compiler optimization to remove dead code. Removing such code has several benefits: it shrinks program size, an important consideration in some contexts, and it allows the running program to avoid executing irrelevant operations, which reduces its running time. It can also enable further optimizations by simplifying program structure. Dead code includes code that can never be executed, and code that only affects dead variables, that is, irrelevant to the program.

In computer science, symbolic execution (also symbolic evaluation or symbex) is a means of analyzing a program to determine what inputs cause each part of a program to execute. An interpreter follows the program, assuming symbolic values for inputs rather than obtaining actual inputs as normal execution of the program would. It thus arrives at expressions in terms of those symbols for expressions and variables in the program, and constraints in terms of those symbols for the possible outcomes of each conditional branch. Finally, the possible inputs that trigger a branch can be determined by solving the constraints.

In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.

Cilk, Cilk++, Cilk Plus and OpenCilk are general-purpose programming languages designed for multithreaded parallel computing. They are based on the C and C++ programming languages, which they extend with constructs to express parallel loops and the fork–join idiom.

In computing, a data segment is a portion of an object file or the corresponding address space of a program that contains initialized static variables, that is, global variables and static local variables. The size of this segment is determined by the size of the values in the program's source code, and does not change at run time.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

Dynamic program analysis is analysis of computer software that involves executing the program in question. Dynamic program analysis includes familiar techniques from software engineering such as unit testing, debugging, and measuring code coverage, but also includes lesser-known techniques like program slicing and invariant inference. Dynamic program analysis is widely applied in security in the form of runtime memory error detection, fuzzing, dynamic symbolic execution, and taint tracking.

Thread Level Speculation (TLS), also known as Speculative Multithreading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.

Loop-level parallelism is a form of parallelism in software programming that is concerned with extracting parallel tasks from loops. The opportunity for loop-level parallelism often arises in computing programs where data is stored in random access data structures. Where a sequential program will iterate over the data structure and operate on indices one at a time, a program exploiting loop-level parallelism will use multiple threads or processes which operate on some or all of the indices at the same time. Such parallelism provides a speedup to overall execution time of the program, typically in line with Amdahl's law.

Concolic testing is a hybrid software verification technique that performs symbolic execution, a classical technique that treats program variables as symbolic variables, along a concrete execution path. Symbolic execution is used in conjunction with an automated theorem prover or constraint solver based on constraint logic programming to generate new concrete inputs with the aim of maximizing code coverage. Its main focus is finding bugs in real-world software, rather than demonstrating program correctness.

In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed.

Tracing just-in-time compilation is a technique used by virtual machines to optimize the execution of a program at runtime. This is done by recording a linear sequence of frequently executed operations, compiling them to native machine code and executing them. This is opposed to traditional just-in-time (JIT) compilers that work on a per-method basis.

References