Steensgaard's algorithm

Last updated

In computer science, Steensgaard's algorithm is a scalable, flow-insensitive, algorithm for pointer analysis. It is often used in compilers, due to its speed (for example, an implementation is available in the LLVM compiler framework). [1] In its original formulation, this algorithm was field-, context-, and array-insensitive.

Steensgaard's algorithm is based on equality constraints, [2] as opposed to Andersen's algorithm, which is based on subset constraints. This allows points-to information to be tracked using a union-find data structure. This choice gives the algorithm its characteristic speed; when implemented using a union-find data structure it is linear space and almost linear time in the size of the input program.

Bjarne Steensgaard's formulation of the algorithm was in terms of type inference and type checking. Steensgaard proposed the points-to analysis for a small imperative but generic pointer language which captures the essential properties of other common languages with pointers, like C. The language semantics and typing rules constitute the analysis.

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, protocol stacks, though decreasingly for application software. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

<span class="mw-page-title-main">Garbage collection (computer science)</span> Form of automatic memory management

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.

In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which together represent a sequence. In its most basic form, each node contains: data, and a reference to the next node in the sequence. This structure allows for efficient insertion or removal of elements from any position in the sequence during iteration. More complex variants add additional links, allowing more efficient insertion or removal of nodes at arbitrary positions. A drawback of linked lists is that access time is linear. Faster access, such as random access, is not feasible. Arrays have better cache locality compared to linked lists.

In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power consumption.

<span class="mw-page-title-main">C++</span> General-purpose programming language

C++ is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup and first released in 1985 as an extension of the C programming language, or "C with Classes". The language has expanded significantly over time, and modern C++ now has object-oriented, generic, and functional features in addition to facilities for low-level memory manipulation. It is almost always implemented as a compiled language, and many vendors provide C++ compilers, including the Free Software Foundation, LLVM, Microsoft, Intel, Embarcadero, Oracle, and IBM, so it is available on many platforms.

<span class="mw-page-title-main">Data type</span> Attribute of data

In computer science and computer programming, a data type is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these values as machine types. A data type specification in a program constrains the possible values that an expression, such as a variable or a function call, might take. On literal data, it tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support basic data types of integer numbers, floating-point numbers, characters and Booleans.

Generic programming is a style of computer programming in which algorithms are written in terms of types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered by the ML programming language in 1973, permits writing common functions or types that differ only in the set of types on which they operate when used, thus reducing duplication. Such software entities are known as generics in Ada, C#, Delphi, Eiffel, F#, Java, Nim, Python, Go, Rust, Swift, TypeScript and Visual Basic .NET. They are known as parametric polymorphism in ML, Scala, Julia, and Haskell ; templates in C++ and D; and parameterized types in the influential 1994 book Design Patterns.

In computer science, program optimization, code optimization, or software optimization, is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources, or draw less power.

<span class="mw-page-title-main">Stack (abstract data type)</span> Abstract data type

In computer science, a stack is an abstract data type that serves as a collection of elements, with two main operations:

The Glasgow Haskell Compiler (GHC) is an open-source native code compiler for the functional programming language Haskell. It provides a cross-platform environment for the writing and testing of Haskell code and it supports numerous extensions, libraries, and optimisations that streamline the process of generating and executing code. GHC is the most commonly used Haskell compiler. The lead developers are Simon Peyton Jones and Simon Marlow.

<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and is not officially an acronym.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recursion is particularly useful, and is often easy to optimize in implementations.

SIGPLAN is the Association for Computing Machinery's Special Interest Group on programming languages.

In computer science, pointer analysis, or points-to analysis, is a static code analysis technique that establishes which pointers, or heap references, can point to which variables, or storage locations. It is often a component of more complex analyses such as escape analysis. A closely related technique is shape analysis.

In computer science, a linked data structure is a data structure which consists of a set of data records (nodes) linked together and organized by references. The link between data can also be called a connector.

In computer science, array is a data type that represents a collection of elements, each selected by one or more indices that can be computed at run time during program execution. Such a collection is usually called an array variable or array value. By analogy with the mathematical concepts vector and matrix, array types with one and two indices are often called vector type and matrix type, respectively. More generally, a multidimensional array type can be called a tensor type, by analogy with the physical concept, tensor.

In program analysis, a polyvariant or context-sensitive analysis analyzes each function multiple times—typically once at each call site—to improve the precision of the analysis. Polyvariance is common in data-flow and pointer analyses.

<span class="mw-page-title-main">Yannis Smaragdakis</span> American computer scientist

Yannis Smaragdakis is a Greek-American software engineer, computer programmer, and researcher. He is a professor in the Department of Informatics and Telecommunications at the University of Athens. He is the author of more than 130 research articles on a variety of topics, including program analysis, declarative languages, program generators, language design, and concurrency. He is best known for work in program generation and program analysis and the Doop framework.

References

  1. "LLVM Alias Analysis Infrastructure — LLVM 8 documentation". releases.llvm.org. Retrieved 2022-04-22.
  2. ( Smaragdakis & Balatsouras 2015 , p. 14)