Profile-guided optimization

Last updated

Profile-guided optimization (PGO, sometimes pronounced as pogo [1] ), also known as profile-directed feedback (PDF), [2] and feedback-directed optimization (FDO) [3] is a compiler optimization technique in computer programming that uses profiling to improve program runtime performance.

Contents

Method

Optimization techniques based on static program analysis of the source code consider code performance improvements without actually executing the program. No dynamic program analysis is performed. The analysis may even consider code within loops including the number of times the loop will execute, for example in loop unrolling. In the absence of all the run time information, static program analysis can not take into account how frequently that code section is actually executed.

The first high-level compiler, introduced as the Fortran Automatic Coding System in 1957, broke the code into blocks and devised a table of the frequency each block is executed via a simulated execution of the code in a Monte Carlo fashion in which the outcome of conditional transfers (as via IF-type statements) is determined by a random number generator suitably weighted by whatever FREQUENCY statements were provided by the programmer. [4]

Rather than programmer-supplied frequency information, profile-guided optimization uses the results of profiling test runs of the instrumented program to optimize the final generated code. [5] [6] [7] The compiler accesses profile data from a sample run of the program across a representative input set. The results indicate which areas of the program are executed more frequently, and which areas are executed less frequently. All optimizations benefit from profile-guided feedback because they are less reliant on heuristics when making compilation decisions. The caveat, however, is that the sample of data fed to the program during the profiling stage must be statistically representative of the typical usage scenarios; otherwise, profile-guided feedback has the potential to harm the overall performance of the final build instead of improving it.

Just-in-time compilation can make use of runtime information to dynamically recompile parts of the executed code to generate a more efficient native code. If the dynamic profile changes during execution, it can deoptimize the previous native code, and generate a new code optimized with the information from the new profile.

Adoption

There is support for building Firefox using PGO. [8] Even though PGO is effective, it has not been widely adopted by software projects, due to its tedious dual-compilation model. [9] It is also possible to perform PGO without instrumentation by collecting a profile using hardware performance counters. [9] This sampling-based approach has a much lower overhead and does not require a special compilation.

The HotSpot Java virtual machine (JVM) uses profile-guided optimization to dynamically generate native code. As a consequence, a software binary is optimized for the actual load it is receiving. If the load changes, adaptive optimization can dynamically recompile the running software to optimize it for the new load. This means that all software executed on the HotSpot JVM effectively make use of profile-guided optimization. [10]

PGO has been adopted in the Microsoft Windows version of Google Chrome. PGO was enabled in the 64-bit edition of Chrome starting with version 53 and version 54 for the 32-bit edition. [11]

Google published a paper [12] describing a system (AutoFDO [13] ) in use for using production profiles to guide builds resulting in up to a 10% performance improvement.

Implementations

Examples of compilers that implement PGO are:

See also

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

<span class="mw-page-title-main">Fortran</span> General-purpose programming language

Fortran is a general-purpose, compiled imperative programming language that is especially suited to numeric computation and scientific computing.

<span class="mw-page-title-main">Interpreter (computing)</span> Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter Virtual Machine.
<span class="mw-page-title-main">Library (computing)</span> Collection of non-volatile resources used by computer programs

In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subroutines, classes, values or type specifications. In IBM's OS/360 and its successors they are referred to as partitioned data sets.

In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language elements, be easier to use, or may automate significant areas of computing systems, making the process of developing a program simpler and more understandable than when using a lower-level language. The amount of abstraction provided defines how "high-level" a programming language is.

In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the called function. Inline expansion is similar to macro expansion, but occurs during compilation, without changing the source code, while macro expansion occurs prior to compilation, and results in different text that is then processed by the compiler.

In computing, binary translation is a form of binary recompilation where sequences of instructions are translated from a source instruction set to the target instruction set. In some cases such as instruction set simulation, the target instruction set may be the same as the source instruction set, providing testing and debugging features such as instruction trace, conditional breakpoints and hot spot detection.

In computing, just-in-time (JIT) compilation is compilation during execution of a program rather than before execution. This may consist of source code translation but is more commonly bytecode translation to machine code, which is then executed directly. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation or recompilation would outweigh the overhead of compiling that code.

HotSpot, released as Java HotSpot Performance Engine, is a Java virtual machine for desktop and server computers, developed by Sun Microsystems and now maintained and distributed by Oracle Corporation. It features improved performance via methods such as just-in-time compilation and adaptive optimization.

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

SafeTSA is a static single assignment form (SSA) intermediate representation capable of representing all of the type safety of the Java programming language and the standard Java Virtual Machine (JVM) byte-code.

Adaptive optimization is a technique in computer science that performs dynamic recompilation of portions of a program based on the current execution profile. With a simple implementation, an adaptive optimizer may simply make a trade-off between just-in-time compilation and interpreting instructions. At another level, adaptive optimization may take advantage of local data conditions to optimize away branches and to use inline expansion to decrease the cost of procedure calls.

Dynamic program analysis is analysis of computer software that involves executing the program in question. Dynamic program analysis includes familiar techniques from software engineering such as unit testing, debugging, and measuring code coverage, but also includes lesser-known techniques like program slicing and invariant inference. Dynamic program analysis is widely applied in security in the form of runtime memory error detection, fuzzing, dynamic symbolic execution, and taint tracking.

In software development, the programming language Java was historically considered slower than the fastest 3rd generation typed languages such as C and C++. The main reason being a different language design, where after compiling, Java programs run on a Java virtual machine (JVM) rather than directly on the computer's processor as native code, as do C and C++ programs. Performance was a matter of concern because much business software has been written in Java after the language quickly became popular in the late 1990s and early 2000s.

In computer science, ahead-of-time compilation is the act of compiling an (often) higher-level programming language into an (often) lower-level language before execution of a program, usually at build-time, to reduce the amount of work needed to be performed at run time.

Intel Fortran Compiler, as part of Intel OneAPI HPC toolkit, is a group of Fortran compilers from Intel for Windows, macOS, and Linux.

<span class="mw-page-title-main">Object code optimizer</span> Aspect of software compilation

An object code optimizer, sometimes also known as a post pass optimizer or, for small sections of code, peephole optimizer, forms part of a software compiler. It takes the output from the source language compile step - the object code or binary file - and tries to replace identifiable sections of the code with replacement code that is more algorithmically efficient.

Tracing just-in-time compilation is a technique used by virtual machines to optimize the execution of a program at runtime. This is done by recording a linear sequence of frequently executed operations, compiling them to native machine code and executing them. This is opposed to traditional just-in-time (JIT) compilers that work on a per-method basis.

<span class="mw-page-title-main">GraalVM</span> Java virtual machine

GraalVM is a Java VM and JDK based on HotSpot/OpenJDK, implemented in Java. It supports additional programming languages and execution modes, like ahead-of-time compilation of Java applications for fast startup and low memory footprint. The first production-ready version, GraalVM 19.0, was released in May 2019. The most recent version is GraalVM 23.0.0 , made available in June 2023.

Quarkus is a Java framework tailored for deployment on Kubernetes. Key technology components surrounding it are OpenJDK HotSpot and GraalVM. The goal of Quarkus is to make Java a leading platform in Kubernetes and serverless environments while offering developers a unified reactive and imperative programming model to optimally address a wider range of distributed application architectures.

References

  1. 1 2 "Microsoft Visual C++ Team Blog". 12 November 2008.
  2. "Profile-directed feedback (PDF)". XL C/C++ for AIX. Retrieved 23 November 2013.
  3. Baptiste Wicht; Roberto A. Vitillo; Dehao Chen; David Levinthal (24 November 2014). "Hardware Counted Profile-Guided Optimization". arXiv: 1411.6361 . Bibcode:2014arXiv1411.6361W.{{cite journal}}: Cite journal requires |journal= (help)
  4. J. W. Backus, R. J. Beeber, et al., The Fortran Automatic Coding System, Proceedings of the Western Joint Computer Conference, February 1957, p. 195
  5. "K. Pettis, R. Hansen, Profile Guided Code Positioning, ACM SIGPLAN Programming Language Design and Implementation Conference 1990" (PDF).
  6. 1 2 "Intel Fortran Compiler 10.1, Professional and Standard Editions, for Mac OS X". Archived from the original on 28 September 2013.
  7. "Profile-Guided Optimization (PGO) Quick Reference".
  8. Building with Profile-Guided Optimization, mozilla.org, 13 August 2013
  9. 1 2 Dehao Chen (2010), "Taming hardware event samples for fdo compilation", Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, pp. 42–52.
  10. Ivanov, Vladimir (25 July 2013). "JVM JIT compilation overview" . Retrieved 10 September 2016.
  11. Marchand, Sébastien (31 October 2016). "Making Chrome on Windows faster with PGO". Archived from the original on 1 November 2016. Retrieved 1 November 2016.
  12. Chen, Dehao; Li, David Xinliang; Moseley, Tipp (2016). "AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications". Proceedings of the 2016 International Symposium on Code Generation and Optimization. New York, NY, USA. pp. 12–23. doi: 10.1145/2854038.2854044 . ISBN   978-1-4503-3778-6. S2CID   17473127.{{cite book}}: CS1 maint: location missing publisher (link)
  13. 1. Install prerequisite, Google, 14 June 2022, retrieved 21 June 2022
  14. "Profile-guided optimizations[VS 2019]". 18 October 2022.
  15. "Profile-guided optimization [Clang Compiler User's Manual]".
  16. Quintero, Dino; Chabrolles, Sebastien; Chen, Chi Hui; Dhandapani, Murali; Holloway, Talor; Jadhav, Chandrakant; Kim, Sae Kee; Kurian, Sijo; Raj, Bharath; Resende, Ronan; Roden, Bjorn; Srinivasan, Niranjan; Wale, Richard; Zanatta, William; Zhang, Zhi; Redbooks, I. B. M. (1 May 2013). IBM Power Systems Performance Guide: Implementing and Optimizing. IBM Redbooks. ISBN   978-0-7384-3766-8 via Google Books.
  17. "Optimize a Native Executable with Profile-Guided Optimizations [GraalVM How-to Guides]".
  18. "What's new in .NET 6: Profile-guided optimization". 26 May 2023.