Gprof

Last updated

Gprof is a performance analysis tool for Unix applications. It used a hybrid of instrumentation and sampling [1] and was created as an extended version of the older "prof" tool. Unlike prof, gprof is capable of limited call graph collecting and printing. [1] [2]

Contents

History

GPROF was originally written by a group led by Susan L. Graham at the University of California, Berkeley for Berkeley Unix (4.2BSD [3] ). Another implementation was written as part of the GNU project for GNU Binutils in 1988 by Jay Fenlason. [4] [5]

Implementation

Instrumentation code is automatically inserted into the program code during compilation (for example, by using the '-pg' option of the gcc compiler), to gather caller-function data. A call to the monitor function 'mcount' is inserted before each function call. [6]

Sampling data is saved in 'gmon.out' or in 'progname.gmon' file just before the program exits, and can be analyzed with the 'gprof' command-line tool. Several gmon files can be combined with 'gprof -s' to accumulate data from several runs of a program.

GPROF output consists of two parts: the flat profile and the call graph. The flat profile gives the total execution time spent in each function and its percentage of the total running time. Function call counts are also reported. Output is sorted by percentage, with hot spots at the top of the list.

The second part of the output is the textual call graph, which shows for each function who called it (parent) and who it called (child subroutines). There is an external tool called gprof2dot capable of converting the call graph from gprof into graphical form. [7]

Limitations and accuracy

At run-time, timing values are obtained by statistical sampling. Sampling is done by probing the target program's program counter at regular intervals using operating system interrupts (programmed via profil(2) or setitimer(2) syscalls). The resulting data is not exact, rather a statistical approximation. The amount of error is usually more than one sampling period. If a value is n times the sampling period, the expected error in the value is the square root of n sampling periods. [8] [9] A typical sampling period is 0.01 second (10 milliseconds) or 0.001 second (1 ms) or in other words 100 or 1000 samples per second of CPU running time.

In some versions, such as BSD, profiling of shared libraries can be limited because of restrictions of the profil function, which may be implemented as library function or as system call. There were analogous utility in glibc called 'sprof' to profile dynamic libraries. [10]

Gprof cannot measure time spent in kernel mode (syscalls, waiting for CPU or I/O waiting), and only user-space code is profiled. [9]

The mcount function may not be thread-safe in some implementations, so multi-threaded application profiles can be incorrect (typically it only profiles the main thread of application). [11]

Instrumentation overhead can be high (estimated as 30% [12] -260% [13] ) for higher-order or object-oriented programs. Mutual recursion and non-trivial cycles are not resolvable by the gprof approach (context-insensitive call graph), because it only records arc traversal, not full call chains. [13] [14] [15]

Gprof with call-graph collecting can be used only with compatible compilers, like GCC, clang/LLVM and some other.

Reception

In 2004 a GPROF paper appeared on the list of the 50 most influential PLDI papers of all time as one of four papers of 1982 year. [16]

According to Thiel, [6] "GPROF ... revolutionized the performance analysis field and quickly became the tool of choice for developers around the world ... the tool still maintains a large following ... the tool is still actively maintained and remains relevant in the modern world."

See also

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

<span class="mw-page-title-main">Linker (computing)</span> Computer program which combines multiple object files into a single file

In computing, a linker or link editor is a computer system program that takes one or more object files and combines them into a single executable file, library file, or another "object" file.

<span class="mw-page-title-main">Netwide Assembler</span> Assembler for the Intel x86 architecture

The Netwide Assembler (NASM) is an assembler and disassembler for the Intel x86 architecture. It can be used to write 16-bit, 32-bit (IA-32) and 64-bit (x86-64) programs. It is considered one of the most popular assemblers for Linux and x86 chips.

<span class="mw-page-title-main">System call</span> Way for programs to access kernel services

In computing, a system call is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

<span class="mw-page-title-main">GNU Autotools</span> Suite of programming tools

The GNU Autotools, also known as the GNU Build System, is a suite of programming tools designed to assist in making source code packages portable to many Unix-like systems.

In computing, just-in-time (JIT) compilation is compilation during execution of a program rather than before execution. This may consist of source code translation but is more commonly bytecode translation to machine code, which is then executed directly. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation or recompilation would outweigh the overhead of compiling that code.

<span class="mw-page-title-main">OpenMP</span> Open standard for parallelizing

OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, FreeBSD, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

<span class="mw-page-title-main">Valgrind</span> Programming tool for profiling, memory debugging and memory leak detection

Valgrind is a programming tool for memory debugging, memory leak detection, and profiling.

In computing, POSIX Threads, commonly known as pthreads, is an execution model that exists independently from a programming language, as well as a parallel execution model. It allows a program to control multiple different flows of work that overlap in time. Each flow of work is referred to as a thread, and creation and control over these flows is achieved by making calls to the POSIX Threads API. POSIX Threads is an API defined by the Institute of Electrical and Electronics Engineers (IEEE) standard POSIX.1c, Threads extensions .

SIGPLAN is the Association for Computing Machinery's Special Interest Group on programming languages.

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

<span class="mw-page-title-main">Binary Modular Dataflow Machine</span>

Binary Modular Dataflow Machine (BMDFM) is a software package that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single applications. BMDFM automatically identifies and exploits parallelism due to the static and mainly dynamic scheduling of the dataflow instruction sequences derived from the formerly sequential program.

<span class="mw-page-title-main">Call graph</span> Structure in computing

A call graph is a control-flow graph, which represents calling relationships between subroutines in a computer program. Each node represents a procedure and each edge (f, g) indicates that procedure f calls procedure g. Thus, a cycle in the graph indicates recursive procedure calls.

Dynamic program analysis is analysis of computer software that involves executing the program in question. Dynamic program analysis includes familiar techniques from software engineering such as unit testing, debugging, and measuring code coverage, but also includes lesser-known techniques like program slicing and invariant inference. Dynamic program analysis is widely applied in security in the form of runtime memory error detection, fuzzing, dynamic symbolic execution, and taint tracking.

Thread Level Speculation (TLS), also known as Speculative Multi-threading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.

A computer program may sleep, which places it into an inactive state for a period of time. Eventually the expiration of an interval timer, or the receipt of a signal or interrupt causes the program to resume execution.

<span class="mw-page-title-main">Incremental computing</span> Software feature

Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is successful, it can be significantly faster than computing new outputs naively. For example, a spreadsheet software package might use incremental computation in its recalculation feature, to update only those cells containing formulas which depend on the changed cells.

ProbeVue is IBM's implementation of a lightweight dynamic tracing environment introduced in AIX version 6.1. ProbeVue provides the ability to probe running processes in order to provide statistical analysis as well as retrieve data from the probed process. The dynamic nature of ProbeVue allows it to be used as a global system performance tool while retaining the ability to drill into very specific events on a single process or thread.

For several years parallel hardware was only available for distributed computing but recently it is becoming available for the low end computers as well. Hence it has become inevitable for software programmers to start writing parallel applications. It is quite natural for programmers to think sequentially and hence they are less acquainted with writing multi-threaded or parallel processing applications. Parallel programming requires handling various issues such as synchronization and deadlock avoidance. Programmers require added expertise for writing such applications apart from their expertise in the application domain. Hence programmers prefer to write sequential code and most of the popular programming languages support it. This allows them to concentrate more on the application. Therefore, there is a need to convert such sequential applications to parallel applications with the help of automated tools. The need is also non-trivial because large amount of legacy code written over the past few decades needs to be reused and parallelized.

Gcov is a source code coverage analysis and statement-by-statement profiling tool. Gcov generates exact counts of the number of times each statement in a program is executed and annotates source code to add instrumentation. Gcov comes as a standard utility with the GNU Compiler Collection (GCC) suite.

References

  1. 1 2 Susan L. Graham, Peter B. Kessler, and Marshall K. Mckusick. gprof: a Call Graph Execution Profiler // Proceedings of the SIGPLAN '82 Symposium on Compiler Construction, SIGPLAN Notices, Vol. 17, No 6, pp. 120-126; doi: 10.1145/800230.806987
  2. gprof --- Call Graph // Ping Huang, Reinventing Computing, MIT AI Lab
  3. HISTORY The gprof profiler appeared in 4.2BSD
  4. GNU gprof manual: "GNU gprof was written by Jay Fenlason."
  5. GNU's Bulletin, vol. 1 no. 5 (1988): "Gprof replacement Foundation staffer Jay Fenlason has recently completed a profiler to go with GNU C, compatible with `GPROF' from Berkeley Unix. "
  6. 1 2 Justin Thiel, An Overview of Software Performance Analysis Tools and Techniques: From GProf to DTrace (2006) "2.1.1 Overview of GProf"
  7. Gprof call graph visualization // Cookbook for scientific computing. Python cookbook. École polytechnique fédérale de Lausanne (EPFL)
  8. Statistical Inaccuracy of gprof Output Archived 2012-05-29 at the Wayback Machine
  9. 1 2 gprof Profiling Tools on BG/P Systems Archived 2013-12-21 at the Wayback Machine , "Issues in Interpreting Profile Data", Argonne Leadership Computing Facility
  10. "The qprof project". HP Labs, Research (archived). Retrieved 28 September 2023.
  11. HOWTO: using gprof with multithreaded applications // Sam Hocevar, 2004-12-13
  12. GNU gprof Profiler Archived 2015-12-08 at the Wayback Machine , Yu Kai Hong, Department of Mathematics at National Taiwan University; July 19, 2008
  13. 1 2 Low-Overhead Call Path Profiling of Unmodified,Optimized Code, ACM 1-59593-167/8/06/2005 .
  14. J. M. Spivey Fast, accurate call graph profiling Archived 2012-02-07 at the Wayback Machine , September 3, 2003 // Software—Practice & Experience archive, Volume 34 Issue 3, March 2004, Pages 249 - 264 Spivey, J. M. (2004). "Fast, accurate call graph profiling". Software: Practice and Experience. 34 (3): 249–264. CiteSeerX   10.1.1.62.1032 . doi:10.1002/spe.562. S2CID   17866706.
  15. Yossi Kreinin, How profilers lie: the cases of gprof and KCachegrind // February 2nd, 2013
  16. 20 Years of PLDI (1979–1999): A Selection, Kathryn S. McKinley, Editor

Further reading