This article needs additional citations for verification .(January 2009) |
Part of a series on |
Software development |
---|
In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.
Profiling is achieved by instrumenting either the program source code or its binary executable form using a tool called a profiler (or code profiler). Profilers may use a number of different techniques, such as event-based, statistical, instrumented, and simulation methods.
Profilers use a wide variety of techniques to collect data, including hardware interrupts, code instrumentation, instruction set simulation, operating system hooks, and performance counters.
Program analysis tools are extremely important for understanding program behavior. Computer architects need such tools to evaluate how well programs will perform on new architectures. Software writers need tools to analyze their programs and identify critical sections of code. Compiler writers often use such tools to find out how well their instruction scheduling or branch prediction algorithm is performing...
— ATOM, PLDI
The output of a profiler may be:
/* ------------ source------------------------- count */ 0001 IF X = "A" 0055 0002 THEN DO 0003 ADD 1 to XCOUNT 0032 0004 ELSE 0005 IF X = "B" 0055
A profiler can be applied to an individual method or at the scale of a module or program, to identify performance bottlenecks by making long-running code obvious. [1] A profiler can be used to understand code from a timing point of view, with the objective of optimizing it to handle various runtime conditions [2] or various loads. [3] Profiling results can be ingested by a compiler that provides profile-guided optimization. [4] Profiling results can be used to guide the design and optimization of an individual algorithm; the Krauss matching wildcards algorithm is an example. [5] Profilers are built into some application performance management systems that aggregate profiling data to provide insight into transaction workloads in distributed applications. [6]
Performance-analysis tools existed on IBM/360 and IBM/370 platforms from the early 1970s, usually based on timer interrupts which recorded the program status word (PSW) at set timer-intervals to detect "hot spots" in executing code.[ citation needed ] This was an early example of sampling (see below). In early 1974 instruction-set simulators permitted full trace and other performance-monitoring features.[ citation needed ]
Profiler-driven program analysis on Unix dates back to 1973, [7] when Unix systems included a basic tool, prof
, which listed each function and how much of program execution time it used. In 1982 gprof
extended the concept to a complete call graph analysis. [8]
In 1994, Amitabh Srivastava and Alan Eustace of Digital Equipment Corporation published a paper describing ATOM [9] (Analysis Tools with OM). The ATOM platform converts a program into its own profiler: at compile time, it inserts code into the program to be analyzed. That inserted code outputs analysis data. This technique - modifying a program to analyze itself - is known as "instrumentation".
In 2004 both the gprof
and ATOM papers appeared on the list of the 50 most influential PLDI papers for the 20-year period ending in 1999. [10]
Flat profilers compute the average call times, from the calls, and do not break down the call times based on the callee or the context.
Call graph profilers [8] show the call times, and frequencies of the functions, and also the call-chains involved based on the callee. In some tools full context is not preserved.
Input-sensitive profilers [11] [12] [13] add a further dimension to flat or call-graph profilers by relating performance measures to features of the input workloads, such as input size or input values. They generate charts that characterize how an application's performance scales as a function of its input.
Profilers, which are also programs themselves, analyze target programs by collecting information on their execution. Based on their data granularity, on how profilers collect information, they are classified into event based or statistical profilers. Profilers interrupt program execution to collect information, which may result in a limited resolution in the time measurements, which should be taken with a grain of salt. Basic block profilers report a number of machine clock cycles devoted to executing each line of code, or a timing based on adding these together; the timings reported per basic block may not reflect a difference between cache hits and misses. [14] [15]
The programming languages listed here have event-based profilers:
Some profilers operate by sampling. A sampling profiler probes the target program's call stack at regular intervals using operating system interrupts. Sampling profiles are typically less numerically accurate and specific, but allow the target program to run at near full speed.
The resulting data are not exact, but a statistical approximation. "The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods." [16]
In practice, sampling profilers can often provide a more accurate picture of the target program's execution than other approaches, as they are not as intrusive to the target program, and thus don't have as many side effects (such as on memory caches or instruction decoding pipelines). Also since they don't affect the execution speed as much, they can detect issues that would otherwise be hidden. They are also relatively immune to over-evaluating the cost of small, frequently called routines or 'tight' loops. They can show the relative amount of time spent in user mode versus interruptible kernel mode such as system call processing.
Still, kernel code to handle the interrupts entails a minor loss of CPU cycles, diverted cache usage, and is unable to distinguish the various tasks occurring in uninterruptible kernel code (microsecond-range activity).
Dedicated hardware can go beyond this: ARM Cortex-M3 and some recent MIPS processors JTAG interface have a PCSAMPLE register, which samples the program counter in a truly undetectable manner, allowing non-intrusive collection of a flat profile.
Some commonly used [17] statistical profilers for Java/managed code are SmartBear Software's AQtime [18] and Microsoft's CLR Profiler. [19] Those profilers also support native code profiling, along with Apple Inc.'s Shark (OSX), [20] OProfile (Linux), [21] Intel VTune and Parallel Amplifier (part of Intel Parallel Studio), and Oracle Performance Analyzer, [22] among others.
This technique effectively adds instructions to the target program to collect the required information. Note that instrumenting a program can cause performance changes, and may in some cases lead to inaccurate results and/or heisenbugs. The effect will depend on what information is being collected, on the level of timing details reported, and on whether basic block profiling is used in conjunction with instrumentation. [23] For example, adding code to count every procedure/routine call will probably have less effect than counting how many times each statement is obeyed. A few computers have special hardware to collect information; in this case the impact on the program is minimal.
Instrumentation is key to determining the level of control and amount of time resolution available to the profilers.
In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.
An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory usage, storage size, and power consumption. Optimization is generally implemented as a sequence of optimizing transformations—algorithms that transform code to produce semantically equivalent code optimized for some aspect.
In computer science, an abstract machine is a theoretical model that allows for a detailed and precise analysis of how a computer system functions. It is similar to a mathematical function in that it receives inputs and produces outputs based on predefined rules. Abstract machines vary from literal machines in that they are expected to perform correctly and independently of hardware. Abstract machines are "machines" because they allow step-by-step execution of programs; they are "abstract" because they ignore many aspects of actual (hardware) machines. A typical abstract machine consists of a definition in terms of input, output, and the set of allowable operations used to turn the former into the latter. They can be used for purely theoretical reasons as well as models for real-world computer systems. In the theory of computation, abstract machines are often used in thought experiments regarding computability or to analyse the complexity of algorithms. This use of abstract machines is fundamental to the field of computational complexity theory, such as finite state machines, Mealy machines, push-down automata, and Turing machines.
In computer science, program analysis is the process of automatically analyzing the behavior of computer programs regarding a property such as correctness, robustness, safety and liveness. Program analysis focuses on two major areas: program optimization and program correctness. The first focuses on improving the program’s performance while reducing the resource usage while the latter focuses on ensuring that the program does what it is supposed to do.
In computing, just-in-time (JIT) compilation is compilation during execution of a program rather than before execution. This may consist of source code translation but is more commonly bytecode translation to machine code, which is then executed directly. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation or recompilation would outweigh the overhead of compiling that code.
In compiler theory, dead-code elimination is a compiler optimization to remove dead code. Removing such code has several benefits: it shrinks program size, an important consideration in some contexts, it reduces resource usage such as the number of bytes to be transferred and it allows the running program to avoid executing irrelevant operations, which reduces its running time. It can also enable further optimizations by simplifying program structure. Dead code includes code that can never be executed, and code that only affects dead variables, that is, irrelevant to the program.
In compiler optimization, register allocation is the process of assigning local automatic variables and expression results to a limited number of processor registers.
SIGPLAN is the Association for Computing Machinery's Special Interest Group (SIG) on programming languages. This SIG explores programming language concepts and tools, focusing on design, implementation, practice, and theory. Its members are programming language developers, educators, implementers, researchers, theoreticians, and users.
Dynamic program analysis is the act of analyzing software that involves executing a program – as opposed to static program analysis, which does not execute it.
Thread Level Speculation (TLS), also known as Speculative Multi-threading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.
Gprof is a performance analysis tool for Unix applications. It used a hybrid of instrumentation and sampling and was created as an extended version of the older "prof" tool. Unlike prof, gprof is capable of limited call graph collecting and printing.
In computer programming, profile-guided optimization, also known as profile-directed feedback (PDF) or feedback-directed optimization (FDO), is the compiler optimization technique of using prior analyses of software artifacts or behaviors ("profiling") to improve the expected runtime performance of the program.
Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is successful, it can be significantly faster than computing new outputs naively. For example, a spreadsheet software package might use incremental computation in its recalculation features, to update only those cells containing formulas which depend on the changed cells.
In computer science, empirical algorithmics is the practice of using empirical methods to study the behavior of algorithms. The practice combines algorithm development and experimentation: algorithms are not just designed, but also implemented and tested in a variety of situations. In this process, an initial design of an algorithm is analyzed so that the algorithm may be developed in a stepwise manner.
In engineering, debugging is the process of finding the root cause, workarounds and possible fixes for bugs.
Pin is a platform for creating analysis tools. A pin tool comprises instrumentation, analysis and callback routines. Instrumentation routines are called when code that has not yet been recompiled is about to be run, and enable the insertion of analysis routines. Analysis routines are called when the code associated with them is run. Callback routines are only called when specific conditions are met, or when a certain event has occurred. Pin provides an extensive application programming interface (API) for instrumentation at different abstraction levels, from one instruction to an entire binary module. It also supports callbacks for many events such as library loads, system calls, signals/exceptions and thread creation events.
For several years parallel hardware was only available for distributed computing but recently it is becoming available for the low end computers as well. Hence it has become inevitable for software programmers to start writing parallel applications. It is quite natural for programmers to think sequentially and hence they are less acquainted with writing multi-threaded or parallel processing applications. Parallel programming requires handling various issues such as synchronization and deadlock avoidance. Programmers require added expertise for writing such applications apart from their expertise in the application domain. Hence programmers prefer to write sequential code and most of the popular programming languages support it. This allows them to concentrate more on the application. Therefore, there is a need to convert such sequential applications to parallel applications with the help of automated tools. The need is also non-trivial because large amount of legacy code written over the past few decades needs to be reused and parallelized.
Tracing just-in-time compilation is a technique used by virtual machines to optimize the execution of a program at runtime. This is done by recording a linear sequence of frequently executed operations, compiling them to native machine code and executing them. This is opposed to traditional just-in-time (JIT) compilers that work on a per-method basis.
DynamoRIO is a BSD-licensed dynamic binary instrumentation framework for the development of dynamic program analysis tools. DynamoRIO targets user space applications under the Android, Linux, and Windows operating systems running on the AArch32, IA-32, and x86-64 instruction set architectures.
Differential testing, also known as differential fuzzing, is a software testing technique that detect bugs, by providing the same input to a series of similar applications, and observing differences in their execution. Differential testing complements traditional software testing because it is well-suited to find semantic or logic bugs that do not exhibit explicit erroneous behaviors like crashes or assertion failures. Differential testing is also called back-to-back testing.
gprof
Output Archived 2012-05-29 at the Wayback Machine