Pointer analysis

Last updated

In computer science, pointer analysis, or points-to analysis, is a static code analysis technique that establishes which pointers, or heap references, can point to which variables, or storage locations. It is often a component of more complex analyses such as escape analysis. A closely related technique is shape analysis.

Contents

This is the most common colloquial use of the term. A secondary use has pointer analysis be the collective name for both points-to analysis, defined as above, and alias analysis. Points-to and alias analysis are closely related but not always equivalent problems.

Example

For the following example program, a points-to analysis would compute that the points-to set of p is {x, y}.

intx;inty;int*p=unknown()?&x:&y;

Introduction

As a form of static analysis, fully precise pointer analysis can be shown to be undecidable. [1] Most approaches are sound, but range widely in performance and precision. Many design decisions impact both the precision and performance of an analysis; often (but not always) lower precision yields higher performance. These choices include: [2] [3]

Context-Insensitive, Flow-Insensitive Algorithms

Pointer analysis algorithms are used to convert collected raw pointer usages (assignments of one pointer to another or assigning a pointer to point to another one) to a useful graph of what each pointer can point to. [4]

Steensgaard's algorithm and Andersen's algorithm are common context-insensitive, flow-insensitive algorithms for pointer analysis. They are often used in compilers, and have implementations in SVF [5] and LLVM.

Flow-Insensitive Approaches

Many approaches to flow-insensitive pointer analysis can be understood as forms of abstract interpretation, where heap allocations are abstracted by their allocation site (i.e., a program location). [6]

Flow-insensitive pointer analyses often abstract possible runtime allocations by their allocation site. At runtime, this program creates three separate heap allocations. A flow-insensitive pointer analysis would treat these as a single abstract memory location, leading to a loss of precision. Pointer Analysis - Abstracting Memory Addresses by Their Allocation Site.svg
Flow-insensitive pointer analyses often abstract possible runtime allocations by their allocation site. At runtime, this program creates three separate heap allocations. A flow-insensitive pointer analysis would treat these as a single abstract memory location, leading to a loss of precision.

Many flow-insensitive algorithms are specified in Datalog, including those in the Soot analysis framework for Java. [7]

Context-sensitive, flow-sensitive algorithms achieve higher precision, generally at the cost of some performance, by analyzing each procedure several times, once per context. [8] Most analyses use a "context-string" approach, where contexts consist of a list of entries (common choices of context entry include call sites, allocation sites, and types). [9] To ensure termination (and more generally, scalability), such analyses generally use a k-limiting approach, where the context has a fixed maximum size, and the least recently added elements are removed as needed. [10] Three common variants of context-sensitive, flow-insensitive analysis are: [11]

Call-site sensitivity

In call-site sensitivity, the points-to set of each variable (the set of abstract heap allocations each variable could point to) is further qualified by a context consisting of a list of callsites in the program. These contexts abstract the control-flow of the program.

The following program demonstrates how call-site sensitivity can achieve higher precision than a flow-insensitive, context-insensitive analysis.

int*id(int*x){returnx;}intmain(){inty;intz;int*y2=id(&y);// call-site 1int*z2=id(&z);// call-site 2return0;}

For this program, a context-insensitive analysis would (soundly but imprecisely) conclude that x can point to either the allocation holding y or that of z, so y2 and z2 may alias, and both could point to either allocation. A callsite-sensitive analysis would analyze id twice, once for call-site 1 and once for call-site 2, and the points-to facts for x would be qualified by the call-site, enabling the analysis to deduce that when main returns, y2 can only point to the allocation holding y and z2 can only point to the allocation holding z.

Object sensitivity

In an object sensitive analysis, the points-to set of each variable is qualified by the abstract heap allocation of the receiver object of the method call. Unlike call-site sensitivity, object-sensitivity is non-syntactic or non-local: the context entries are derived during the points-to analysis itself. [12]

Type sensitivity

Type sensitivity is a variant of object sensitivity where the allocation site of the receiver object is replaced by the class/type containing the method containing the allocation site of the receiver object. [13] This results in strictly fewer contexts than would be used in an object-sensitive analysis, which generally means better performance.

Related Research Articles

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

<span class="mw-page-title-main">Garbage collection (computer science)</span> Form of automatic memory management

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.

In computer science, abstract interpretation is a theory of sound approximation of the semantics of computer programs, based on monotonic functions over ordered sets, especially lattices. It can be viewed as a partial execution of a computer program which gains information about its semantics without performing all the calculations.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

In compiler optimization, register allocation is the process of assigning local automatic variables and expression results to a limited number of processor registers.

In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is, some facilities are type-safe and their usage will not result in type errors, while other facilities in the same language may be type-unsafe and a program using them may encounter type errors. The behaviors classified as type errors by a given programming language are usually those that result from attempts to perform operations on values that are not of the appropriate data type, e.g., adding a string to an integer when there's no definition on how to handle this case. This classification is partly based on opinion.

Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. A program's control-flow graph (CFG) is used to determine those parts of a program to which a particular value assigned to a variable might propagate. The information gathered is often used by compilers when optimizing a program. A canonical example of a data-flow analysis is reaching definitions.

SIGPLAN is the Association for Computing Machinery's Special Interest Group on programming languages.

Alias analysis is a technique in compiler theory, used to determine if a storage location may be accessed in more than one way. Two pointers are said to be aliased if they point to the same location.

In compiler optimization, escape analysis is a method for determining the dynamic scope of pointers – where in the program a pointer can be accessed. It is related to pointer analysis and shape analysis.

In program analysis, shape analysis is a static code analysis technique that discovers and verifies properties of linked, dynamically allocated data structures in computer programs. It is typically used at compile time to find software bugs or to verify high-level correctness properties of programs. In Java programs, it can be used to ensure that a sort method correctly sorts a list. For C programs, it might look for places where a block of memory is not properly freed.

Memory safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. For example, Java is said to be memory-safe because its runtime error detection checks array bounds and pointer dereferences. In contrast, C and C++ allow arbitrary pointer arithmetic with pointers implemented as direct memory addresses with no provision for bounds checking, and thus are potentially memory-unsafe.

In computer science, region-based memory management is a type of memory management in which each allocated object is assigned to a region. A region, also called a zone, arena, area, or memory context, is a collection of allocated objects that can be efficiently reallocated or deallocated all at once. Like stack allocation, regions facilitate allocation and deallocation of memory with low overhead; but they are more flexible, allowing objects to live longer than the stack frame in which they were allocated. In typical implementations, all objects in a region are allocated in a single contiguous range of memory addresses, similarly to how stack frames are typically allocated.

Typestate analysis, sometimes called protocol analysis, is a form of program analysis employed in programming languages. It is most commonly applied to object-oriented languages. Typestates define valid sequences of operations that can be performed upon an instance of a given type. Typestates, as the name suggests, associate state information with variables of that type. This state information is used to determine at compile-time which operations are valid to be invoked upon an instance of the type. Operations performed on an object that would usually only be executed at run-time are performed upon the type state information which is modified to be compatible with the new state of the object.

In program analysis, a polyvariant or context-sensitive analysis analyzes each function multiple times—typically once at each call site—to improve the precision of the analysis. Polyvariance is common in data-flow and pointer analyses.

<span class="mw-page-title-main">Kathryn S. McKinley</span> American computer scientist

Kathryn S. McKinley is an American computer scientist noted for her research on compilers, runtime systems, and computer architecture. She is also known for her leadership in broadening participation in computing. McKinley was co-chair of CRA-W from 2011 to 2014.

In computer science, Steensgaard's algorithm is a scalable, flow-insensitive, algorithm for pointer analysis. It is often used in compilers, due to its speed. In its original formulation, this algorithm was field-, context-, and array-insensitive.

Runtime predictive analysis is a runtime verification technique in computer science for detecting property violations in program executions inferred from an observed execution. An important class of predictive analysis methods has been developed for detecting concurrency errors in concurrent programs, where a runtime monitor is used to predict errors which did not happen in the observed run, but can happen in an alternative execution of the same program. The predictive capability comes from the fact that the analysis is performed on an abstract model extracted online from the observed execution, which admits a class of executions beyond the observed one.

<span class="mw-page-title-main">Yannis Smaragdakis</span> American computer scientist

Yannis Smaragdakis is a Greek-American software engineer, computer programmer, and researcher. He is a professor in the Department of Informatics and Telecommunications at the University of Athens. He is the author of more than 130 research articles on a variety of topics, including program analysis, declarative languages, program generators, language design, and concurrency. He is best known for work in program generation and program analysis and the Doop framework.

The LogicBlox system is a commercial, declarative, incremental logic programming language and deductive database inspired by Datalog. The LogiQL programming language extends Datalog with several features, including stratified negation, aggregation, and a module system. LogicBlox has been used to build pointer analyses for Java.

References

  1. Reps, Thomas (2000-01-01). "Undecidability of context-sensitive data-dependence analysis". ACM Transactions on Programming Languages and Systems. 22 (1): 162–186. doi: 10.1145/345099.345137 . ISSN   0164-0925. S2CID   2956433.
  2. Barbara G. Ryder (2003). "Dimensions of Precision in Reference Analysis of Object-Oriented Programming Languages". Compiler Construction, 12th International Conference, CC 2003 Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2003 Warsaw, Poland, April 7–11, 2003 Proceedings. pp. 126–137. doi: 10.1007/3-540-36579-6_10 .
  3. ( Hind )
  4. Zyrianov, Vlas; Newman, Christian D.; Guarnera, Drew T.; Collard, Michael L.; Maletic, Jonathan I. (2019). "srcPtr: A Framework for Implementing Static Pointer Analysis Approaches" (PDF). ICPC '19: Proceedings of the 27th IEEE International Conference on Program Comprehension. Montreal, Canada: IEEE.
  5. Sui, Yulei; Xue, Jingling (2016). "SVF: interprocedural static value-flow analysis in LLVM" (PDF). CC'16: Proceedings of the 25th international conference on compiler construction. ACM.
  6. Smaragdakis, Yannis; Bravenboer, Martin; Lhoták, Ondrej (2011-01-26). "Pick your contexts well". Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '11. Austin, Texas, USA: Association for Computing Machinery. pp. 17–30. doi:10.1145/1926385.1926390. ISBN   978-1-4503-0490-0. S2CID   6451826.
  7. Antoniadis, Tony; Triantafyllou, Konstantinos; Smaragdakis, Yannis (2017-06-18). "Porting doop to Soufflé". Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis. SOAP 2017. Barcelona, Spain: Association for Computing Machinery. pp. 25–30. doi:10.1145/3088515.3088522. ISBN   978-1-4503-5072-3. S2CID   3074689.
  8. ( Smaragdakis & Balatsouras , p. 29)
  9. Thiessen, Rei; Lhoták, Ondřej (2017-06-14). "Context transformations for pointer analysis". ACM SIGPLAN Notices. 52 (6): 263–277. doi:10.1145/3140587.3062359. ISSN   0362-1340.
  10. ( Li et al. , pp. 1:4)
  11. ( Smaragdakis & Balatsouras )
  12. ( Smaragdakis & Balatsouras , p. 37)
  13. ( Smaragdakis & Balatsouras , p. 39)

Bibliography