Gprof is a performance analysis tool for Unix applications. It used a hybrid of instrumentation and sampling [1] and was created as an extended version of the older "prof" tool. Unlike prof, gprof is capable of limited call graph collecting and printing. [1] [2]
GPROF was originally written by a group led by Susan L. Graham at the University of California, Berkeley for Berkeley Unix (4.2BSD [3] ). Another implementation was written as part of the GNU project for GNU Binutils in 1988 by Jay Fenlason. [4] [5]
Instrumentation code is automatically inserted into the program code during compilation (for example, by using the '-pg
' option of the gcc compiler), to gather caller-function data. A call to the monitor function 'mcount' is inserted before each function call. [6]
Sampling data is saved in 'gmon.out' or in 'progname.gmon' file just before the program exits, and can be analyzed with the 'gprof
' command-line tool. Several gmon files can be combined with 'gprof -s
' to accumulate data from several runs of a program.
GPROF output consists of two parts: the flat profile and the call graph. The flat profile gives the total execution time spent in each function and its percentage of the total running time. Function call counts are also reported. Output is sorted by percentage, with hot spots at the top of the list.
The second part of the output is the textual call graph, which shows for each function who called it (parent) and who it called (child subroutines). There is an external tool called gprof2dot capable of converting the call graph from gprof into graphical form. [7]
At run-time, timing values are obtained by statistical sampling. Sampling is done by probing the target program's program counter at regular intervals using operating system interrupts (programmed via profil(2) or setitimer(2) syscalls). The resulting data is not exact, rather a statistical approximation. The amount of error is usually more than one sampling period. If a value is n times the sampling period, the expected error in the value is the square root of n sampling periods. [8] [9] A typical sampling period is 0.01 second (10 milliseconds) or 0.001 second (1 ms) or in other words 100 or 1000 samples per second of CPU running time.
In some versions, such as BSD, profiling of shared libraries can be limited because of restrictions of the profil function, which may be implemented as library function or as system call. There were analogous utility in glibc called 'sprof' to profile dynamic libraries. [10]
Gprof cannot measure time spent in kernel mode (syscalls, waiting for CPU or I/O waiting), and only user-space code is profiled. [9]
The mcount function may not be thread-safe in some implementations, so multi-threaded application profiles can be incorrect (typically it only profiles the main thread of application). [11]
Instrumentation overhead can be high (estimated as 30% [12] -260% [13] ) for higher-order or object-oriented programs. Mutual recursion and non-trivial cycles are not resolvable by the gprof approach (context-insensitive call graph), because it only records arc traversal, not full call chains. [13] [14] [15]
Gprof with call-graph collecting can be used only with compatible compilers, like GCC, clang/LLVM and some other.
In 2004 a GPROF paper appeared on the list of the 50 most influential PLDI papers of all time as one of four papers of 1982 year. [16]
According to Thiel, [6] "GPROF ... revolutionized the performance analysis field and quickly became the tool of choice for developers around the world ... the tool still maintains a large following ... the tool is still actively maintained and remains relevant in the modern world."
In computing, a linker or link editor is a computer system program that takes one or more object files and combines them into a single executable file, library file, or another "object" file.
In computing, a system call is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.
The GNU Autotools, also known as the GNU Build System, is a suite of programming tools designed to assist in making source code packages portable to many Unix-like systems.
In software development, Make is a command-line interface software tool that performs actions ordered by configured dependencies as defined in a configuration file called a makefile. It is commonly used for build automation to build executable code from source code. But, not limited to building, Make can perform any operation available via the operating system shell.
In computing, just-in-time (JIT) compilation is compilation during execution of a program rather than before execution. This may consist of source code translation but is more commonly bytecode translation to machine code, which is then executed directly. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation or recompilation would outweigh the overhead of compiling that code.
Valgrind is a programming tool for memory debugging, memory leak detection, and profiling.
In computing, POSIX Threads, commonly known as pthreads, is an execution model that exists independently from a programming language, as well as a parallel execution model. It allows a program to control multiple different flows of work that overlap in time. Each flow of work is referred to as a thread, and creation and control over these flows is achieved by making calls to the POSIX Threads API. POSIX Threads is an API defined by the Institute of Electrical and Electronics Engineers (IEEE) standard POSIX.1c, Threads extensions .
SIGPLAN is the Association for Computing Machinery's Special Interest Group (SIG) on programming languages. This SIG explores programming language concepts and tools, focusing on design, implementation, practice, and theory. Its members are programming language developers, educators, implementers, researchers, theoreticians, and users.
In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.
A call graph is a control-flow graph, which represents calling relationships between subroutines in a computer program. Each node represents a procedure and each edge (f, g) indicates that procedure f calls procedure g. Thus, a cycle in the graph indicates recursive procedure calls.
Dynamic program analysis is the act of analyzing software that involves executing a program – as opposed to static program analysis, which does not execute it.
Thread Level Speculation (TLS), also known as Speculative Multi-threading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.
A computer program may sleep, which places it into an inactive state for a period of time. Eventually the expiration of an interval timer, or the receipt of a signal or interrupt causes the program to resume execution.
mtrace
is the memory debugger included in the GNU C Library.
Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is successful, it can be significantly faster than computing new outputs naively. For example, a spreadsheet software package might use incremental computation in its recalculation features, to update only those cells containing formulas which depend on the changed cells.
ProbeVue is IBM's implementation of a lightweight dynamic tracing environment introduced in AIX version 6.1. ProbeVue provides the ability to probe running processes in order to provide statistical analysis as well as retrieve data from the probed process. The dynamic nature of ProbeVue allows it to be used as a global system performance tool while retaining the ability to drill into very specific events on a single process or thread.
For several years parallel hardware was only available for distributed computing but recently it is becoming available for the low end computers as well. Hence it has become inevitable for software programmers to start writing parallel applications. It is quite natural for programmers to think sequentially and hence they are less acquainted with writing multi-threaded or parallel processing applications. Parallel programming requires handling various issues such as synchronization and deadlock avoidance. Programmers require added expertise for writing such applications apart from their expertise in the application domain. Hence programmers prefer to write sequential code and most of the popular programming languages support it. This allows them to concentrate more on the application. Therefore, there is a need to convert such sequential applications to parallel applications with the help of automated tools. The need is also non-trivial because large amount of legacy code written over the past few decades needs to be reused and parallelized.
The write is one of the most basic routines provided by a Unix-like operating system kernel. It writes data from a buffer declared by the user to a given device, such as a file. This is the primary way to output data from a program by directly using a system call. The destination is identified by a numeric code. The data to be written, for instance a piece of text, is defined by a pointer and a size, given in number of bytes.
Gcov is a source code coverage analysis and statement-by-statement profiling tool. Gcov generates exact counts of the number of times each statement in a program is executed and annotates source code to add instrumentation. Gcov comes as a standard utility with the GNU Compiler Collection (GCC) suite.
Tracing just-in-time compilation is a technique used by virtual machines to optimize the execution of a program at runtime. This is done by recording a linear sequence of frequently executed operations, compiling them to native machine code and executing them. This is opposed to traditional just-in-time (JIT) compilers that work on a per-method basis.
{{cite web}}
: CS1 maint: bot: original URL status unknown (link)