Program dependence graph

Last updated

A Program Dependence Graph (PDG) is a directed graph of a program's control and data dependencies. Nodes represent program statements and edges represent dependencies between these statements.

PDGs are used in optimization, debugging, and understanding program behavior. One example of this is their utilization by compilers during dependence analysis, enabling the optimizing compiler to make transformations to allow for parallelism. [1] [2]

See also

Dependency graph

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory use, storage size, and power consumption.

<span class="mw-page-title-main">Control-flow graph</span> Graphical representation of a computer program or algorithm

In computer science, a control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. The control-flow graph was discovered by Frances E. Allen, who noted that Reese T. Prosser used boolean connectivity matrices for flow analysis before.

In compiler design, static single assignment form is a type of intermediate representation (IR) where each variable is assigned exactly once. SSA is used in most high-quality optimizing compilers for imperative languages, including LLVM, the GNU Compiler Collection, and many commercial compilers.

In compiler theory, dead-code elimination is a compiler optimization to remove dead code. Removing such code has several benefits: it shrinks program size, an important consideration in some contexts, it reduces resource usage such as the number of bytes to be transferred and it allows the running program to avoid executing irrelevant operations, which reduces its running time. It can also enable further optimizations by simplifying program structure. Dead code includes code that can never be executed, and code that only affects dead variables, that is, irrelevant to the program.

<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recursion is particularly useful, and is often easy to optimize in implementations.

In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing the meaning of the code:

An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" IR must be accurate – capable of representing the source code without loss of information – and independent of any particular source or target language. An IR may take one of several forms: an in-memory data structure, or a special tuple- or stack-based code readable by the program. In the latter case it is also called an intermediate language.

In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capabilities. Most execution time of a scientific program is spent on loops; as such, many compiler optimization techniques have been developed to make them faster.

In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement S2 depends on S1 if S1 must be executed before S2. Broadly, there are two classes of dependencies--control dependencies and data dependencies.

In computer science, loop dependence analysis is a process which can be used to find dependencies within iterations of a loop with the goal of determining different relationships between statements. These dependent relationships are tied to the order in which different statements access memory locations. Using the analysis of these relationships, execution of the loop can be organized to allow multiple processors to work on different portions of the loop in parallel. This is known as parallel processing. In general, loops can consume a lot of processing time when executed as serial code. Through parallel processing, it is possible to reduce the total execution time of a program through sharing the processing load among multiple processors.

Automatic parallelization, also auto parallelization, or autoparallelization refers to converting sequential code into multi-threaded and/or vectorized code in order to use multiple processors simultaneously in a shared-memory multiprocessor (SMP) machine. Fully automatic parallelization of sequential programs is a challenge because it requires complex program analysis and the best approach may depend upon parameter values that are not known at compilation time.

Automatic vectorization, in parallel computing, is a special case of automatic parallelization, where a computer program is converted from a scalar implementation, which processes a single pair of operands at a time, to a vector implementation, which processes one operation on multiple pairs of operands at once. For example, modern conventional computers, including specialized supercomputers, typically have vector operations that simultaneously perform operations such as the following four additions :

<span class="mw-page-title-main">Mathomatic</span> Computer algebra system

Mathomatic is a free, portable, general-purpose computer algebra system (CAS) that can symbolically solve, simplify, combine and compare algebraic equations, and can perform complex number, modular, and polynomial arithmetic, along with standard arithmetic. It does some symbolic calculus (derivative, extrema, Taylor series, and polynomial integration and Laplace transforms), numerical integration, and handles all elementary algebra except logarithms. Trigonometric functions can be entered and manipulated using complex exponentials, with the GNU m4 preprocessor. Not currently implemented are general functions like f(x), arbitrary-precision and interval arithmetic, and matrices.

A data dependency in computer science is a situation in which a program statement (instruction) refers to the data of a preceding statement. In compiler theory, the technique used to discover data dependencies among statements is called dependence analysis.

Interprocedural optimization (IPO) is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used functions of small or medium length. IPO differs from other compiler optimizations by analyzing the entire program as opposed to a single function or block of code.

Use of the polyhedral model within a compiler requires software to represent the objects of this framework and perform operations upon them.

In computer science, code motion, also known as code hoisting, code sinking, loop-invariant code motion, or code factoring, is a blanket term for any process that moves code within a program for performance or size benefits, and is a common optimization performed in most optimizing compilers. It can be difficult to differentiate between different types of code motion, due to the inconsistent meaning of the terms surrounding it.

In computer science, a code property graph (CPG) is a computer program representation that captures syntactic structure, control flow, and data dependencies in a property graph. The concept was originally introduced to identify security vulnerabilities in C and C++ system code, but has since been employed to analyze web applications, cloud deployments, and smart contracts. Beyond vulnerability discovery, code property graphs find applications in code clone detection, attack-surface detection, exploit generation, measuring code testability, and backporting of security patches.

References

  1. Jeanne Ferrante; Ottenstein, Karl J.; Warren, Joe D. (July 1987). "The Program Dependence Graph and its Use in Optimization" (PDF). ACM Transactions on Programming Languages and Systems. 9 (3): 319–349. CiteSeerX   10.1.1.101.27 . doi:10.1145/24039.24041. S2CID   505075.
  2. "Dependence Graphs in LLVM — LLVM 19.0.0git documentation". llvm.org. Retrieved 2024-06-26.