Call graph

Last updated
A call graph generated for a simple computer program in Python. A Call Graph generated by pycallgraph.png
A call graph generated for a simple computer program in Python.

A call graph (also known as a call multigraph [1] [2] ) is a control-flow graph, [3] which represents calling relationships between subroutines in a computer program. Each node represents a procedure and each edge (f, g) indicates that procedure f calls procedure g. Thus, a cycle in the graph indicates recursive procedure calls.

Contents

Basic concepts

Call graphs can be dynamic or static. [4] A dynamic call graph is a record of an execution of the program, for example as output by a profiler. Thus, a dynamic call graph can be exact, but only describes one run of the program. A static call graph is a call graph intended to represent every possible run of the program. The exact static call graph is an undecidable problem, so static call graph algorithms are generally overapproximations. That is, every call relationship that occurs is represented in the graph, and possibly also some call relationships that would never occur in actual runs of the program.

Call graphs can be defined to represent varying degrees of precision. A more precise call graph more precisely approximates the behavior of the real program, at the cost of taking longer to compute and more memory to store. The most precise call graph is fully context-sensitive, which means that for each procedure, the graph contains a separate node for each call stack that procedure can be activated with. A fully context-sensitive call graph is called a calling context tree. This can be computed dynamically easily, although it may take up a large amount of memory. Calling context trees are usually not computed statically, because it would take too long for a large program. The least precise call graph is context-insensitive, which means that there is only one node for each procedure.

With languages that feature dynamic dispatch (i.e. Java or C++), [5] first-class functions (i.e. Python or Racket), or function pointers (i.e. C), computing a static call graph precisely requires alias analysis results. Conversely, computing precise aliasing requires a call graph. Many static analysis systems solve the apparent infinite regress by computing both simultaneously.

Usages

Call graphs can be used in different ways. One simple application of call graphs is finding procedures that are never called. Call graphs can act as documentation for humans to understand programs. [6] Call graphs can also be used to detect anomalies of program execution or code injection attacks. [7]

Software

Free software call graph generators

Run-time call graph (most of tools listed are profilers with call graph functionality)

  • gprof  : included in BSD or part of the GNU Binary Utilities
  • callgrind : part of Valgrind
  • KCachegrind  : powerful tool to generate and analyze call graphs based on data generated by callgrind
  • Mac OS X Activity Monitor : Apple GUI process monitor Activity Monitor has a built-in call graph generator that can sample processes and return a call graph. This function is only available in Mac OS X Leopard
  • OpenPAT : includes the control_flow tool which automatically creates a Graphviz call-graph picture from runtime measurements.
  • pprof, open source tool for visualization and analysis of profile data, to be used in conjunction with gperftools.
  • CodeAnalyst from AMD (released under GPL)
  • makeppgraph is a dependency graph generator (at module level) for builds performed with makepp.
  • Intel(R) Single Event API (free, open-source)

Static for getting call graphs without running application

C/C++
  • Sourcetrail creates a static call graph, that can be dynamically explored by the user. Also supports Python and Java
  • doxygen  : Uses Graphviz to generate static call/inheritance diagrams
  • Cally: a tool that uses GCC's Register Transfer Language (RTL) files to build a caller or callee call graphs for C projects.
  • cflow  : GNU cflow is able to generate the direct and inverted call graph of a C program
  • egypt  : a small Perl script that uses gcc and Graphviz to generate the static call graph of a C program.
  • Analizo: calculates source code metrics, generates dependency graphs.
  • CCTree  : Native Vim plugin that can display static call graphs by reading a cscope database. Works for C programs.
  • codeviz  : a static call graph generator (the program is not run). Implemented as a patch to gcc; works for C and C++ programs.
  • calltree.sh  : Bash shell functions that glue together cscope, graphviz, and a sampling of dot-rendering tools to display "caller" and "callee" relationships above, below, and/or between the C functions you specify.
  • tceetree  : like calltree.sh, it connects Cscope and Graphviz, but it is an executable rather than a bash script.
Go
  • go-callvis  : an interactive call graph generator for Go programs whose output can be drawn with Graphviz
Multi-language
  • callGraph  : open-source call graph generator for awk, bash, basic, dart, fortran, go, lua, javascript, julia, kotlin, matlab, perl, pascal, php, python, R, raku, ruby, rust, scala, swift, tcl, and typescript.
.NET
  • NDepend :is a static analysis tool for .NET code. This tool supports a large number of code metrics, allows for visualization of dependencies using directed graphs and dependency matrix.
PHP, Perl and Python
  • Devel::NYTProf  : a Perl performance analyser and call chart generator
  • phpCallGraph  : a call graph generator for PHP programs that uses Graphviz. It is written in PHP and requires at least PHP 5.2.
  • pycallgraph  : a call graph generator for Python programs that uses Graphviz.
  • pyan  : a static call graph generator for Python programs that uses Graphviz.
  • gprof2dot  : A call graph generator written in Python that converts profiling data for many languages/runtimes to a Graphviz callgraph.
  • code2flow: A call graph generator for Python and Javascript programs that uses Graphviz
  • rcviz  : Python module for rendering runtime-generated call graphs with Graphviz. Each node represents an invocation of a function with the parameters passed to it and the return value.
XQuery

Proprietary call graph generators

LDRA Testbed
Static and dynamic analysis engines for both host and embedded software, with a myriad of reports including call graphs.
Project Analyzer
Static code analyzer and call graph generator for Visual Basic code
Visual Expert
Static code analyzer and call graph generator for Oracle PL/SQL, SQLServer Transact-SQL, C# and PowerBuilder code
Intel VTune Performance Analyzer
Instrumenting profiler to show call graph and execution statistics
DMS Software Reengineering Toolkit
Customizable program analysis tool with static whole-program global call graph extraction for C, Java and COBOL
Graphviz
Turns a text representation of any graph (including a call graph) into a picture.
tsort
Command-line utility that performs a topological sort.

Sample graph

A sample call graph generated from gprof analyzing itself:

index    called     name                              |index    called     name       72384/72384       sym_id_parse [54]             |       1508/1508        cg_dfn [15] [3]   72384             match [3]                     |[13]   1508             pre_visit [13] ----------------------                                |----------------------           4/9052        cg_tally [32]                 |       1508/1508        cg_assemble [38]        3016/9052        hist_print [49]               |[14]   1508             propagate_time [14]        6032/9052        propagate_flags [52]          |---------------------- [4]    9052             sym_lookup [4]                |          2             cg_dfn [15] ----------------------                                |       1507/1507        cg_assemble [38]        5766/5766        core_create_function_syms [41]|[15]   1507+2           cg_dfn [15] [5]    5766             core_sym_class [5]            |       1509/1509        is_numbered [9] ----------------------                                |       1508/1508        is_busy [11]          24/1537        parse_spec [19]               |       1508/1508        pre_visit [13]        1513/1537        core_create_function_syms [41]|       1508/1508        post_visit [12] [6]    1537             sym_init [6]                  |          2             cg_dfn [15] ----------------------                                |----------------------        1511/1511        core_create_function_syms [41]|       1505/1505        hist_print [49] [7]    1511             get_src_info [7]              |[16]   1505             print_line [16] ----------------------                                |          2/9           print_name_only [25]           2/1510        arc_add [31]                  |----------------------        1508/1510        cg_assemble [38]              |       1430/1430        core_create_function_syms [41] [8]    1510             arc_lookup [8]                |[17]   1430             source_file_lookup_path [17] ----------------------                                |----------------------        1509/1509        cg_dfn [15]                   |         24/24          sym_id_parse [54] [9]    1509             is_numbered [9]               |[18]     24             parse_id [18] ----------------------                                |         24/24          parse_spec [19]        1508/1508        propagate_flags [52]          |---------------------- [10]   1508             inherit_flags [10]            |         24/24          parse_id [18] ----------------------                                |[19]     24             parse_spec [19]        1508/1508        cg_dfn [15]                   |         24/1537        sym_init [6] [11]   1508             is_busy [11]                  |---------------------- ----------------------                                |         24/24          main [1210]        1508/1508        cg_dfn [15]                   |[20]     24             sym_id_add [20] [12]   1508             post_visit [12]               | 

See also

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

Yacc is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser based on a formal grammar, written in a notation similar to Backus–Naur Form (BNF). Yacc is supplied as a standard utility on BSD and AT&T Unix. GNU-based Linux distributions include Bison, a forward-compatible Yacc replacement.

In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures where each such procedure implements one of the nonterminals of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

Lexical tokenization is conversion of a text into meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is not the same process as the probabilistic tokenization, used for large language model's data preprocessing, that encode text into numerical tokens, using byte pair encoding.

Bytecode is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references that encode the result of compiler parsing and performing semantic analysis of things like type, scope, and nesting depths of program objects.

Lex is a computer program that generates lexical analyzers.

A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs, that can be combined to accomplish a task, much as one might use multiple hands to fix a physical object. The most basic tools are a source code editor and a compiler or interpreter, which are used ubiquitously and continuously. Other tools are used more or less depending on the language, development methodology, and individual engineer, often used for a discrete task, like a debugger or profiler. Tools may be discrete programs, executed separately – often from the command line – or may be parts of a single large program, called an integrated development environment (IDE). In many cases, particularly for simpler use, simple ad hoc techniques are used instead of a tool, such as print debugging instead of using a debugger, manual timing instead of a profiler, or tracking bugs in a text file or spreadsheet instead of a bug tracking system.

Flex is a free and open-source software alternative to lex. It is a computer program that generates lexical analyzers . It is frequently used as the lex implementation together with Berkeley Yacc parser generator on BSD-derived operating systems, or together with GNU bison in *BSD ports and in Linux distributions. Unlike Bison, flex is not part of the GNU Project and is not released under the GNU General Public License, although a manual for Flex was produced and published by the Free Software Foundation.

<span class="mw-page-title-main">Doxygen</span> Free software for generating software documentation from source code

Doxygen is a documentation generator and static analysis tool for software source trees. When used as a documentation generator, Doxygen extracts information from specially-formatted comments within the code. When used for analysis, Doxygen uses its parse tree to generate diagrams and charts of the code structure. Doxygen can cross reference documentation and code, so that the reader of a document can easily refer to the actual code.

<span class="mw-page-title-main">Graphviz</span> Software package for graph visualization

Graphviz is a package of open-source tools initiated by AT&T Labs Research for drawing graphs specified in DOT language scripts having the file name extension "gv". It also provides libraries for software applications to use the tools. Graphviz is free software licensed under the Eclipse Public License.

DOT is a graph description language, developed as a part of the Graphviz project. DOT graphs are typically stored as files with the .gv or .dot filename extension — .gv is preferred, to avoid confusion with the .dot extension used by versions of Microsoft Word before 2007. dot is also the name of the main program to process DOT files in the Graphviz package.

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

cscope is a programming tool which works in console mode, text-based interface, that allows computer programmers or software developers to search source code of the programming language C, with some support for C++ and Java. It is often used on very large projects to aid code comprehension to find source code, functions, declarations, definitions and regular expressions given a text string. cscope is free and released under a BSD license. The original developer of cscope is Joe Steffen.

Gprof is a performance analysis tool for Unix applications. It used a hybrid of instrumentation and sampling and was created as an extended version of the older "prof" tool. Unlike prof, gprof is capable of limited call graph collecting and printing.

<span class="mw-page-title-main">History of compiler construction</span>

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

For several years parallel hardware was only available for distributed computing but recently it is becoming available for the low end computers as well. Hence it has become inevitable for software programmers to start writing parallel applications. It is quite natural for programmers to think sequentially and hence they are less acquainted with writing multi-threaded or parallel processing applications. Parallel programming requires handling various issues such as synchronization and deadlock avoidance. Programmers require added expertise for writing such applications apart from their expertise in the application domain. Hence programmers prefer to write sequential code and most of the popular programming languages support it. This allows them to concentrate more on the application. Therefore, there is a need to convert such sequential applications to parallel applications with the help of automated tools. The need is also non-trivial because large amount of legacy code written over the past few decades needs to be reused and parallelized.

Google Kythe is a source code indexer and cross-referencer for code comprehension which describes itself as a "pluggable, (mostly) language-agnostic ecosystem for building tools that work with code".

Visual Expert is a static code analysis tool, extracting design and technical information from software source code by reverse-engineering, used by programmers for software maintenance, modernization or optimization.

References

  1. Callahan, D.; Carle, A.; Hall, M. W.; Kennedy, K. (April 1990). "Constructing the procedure call multigraph". IEEE Transactions on Software Engineering. 16 (4): 483–487. doi:10.1109/32.54302.
  2. Uday Khedker; Amitabha Sanyal; Bageshri Sathe (2009). Data Flow Analysis: Theory and Practice. CRC Press. p. 234. ISBN   978-0-8493-3251-7.
  3. Pankaj Jalote (1997). An Integrated Approach to Software Engineering . Springer Science & Business Media. p.  372. ISBN   978-0-387-94899-7.
  4. Ryder, B.G. (May 1979). "Constructing the Call Graph of a Program". IEEE Transactions on Software Engineering. SE-5 (3): 216–226. doi:10.1109/tse.1979.234183. S2CID   16527042.
  5. Grove, David; DeFouw, Greg; Dean, Jeffrey; Chambers, Craig; Grove, David; DeFouw, Greg; Dean, Jeffrey; Chambers, Craig (9 October 1997). "Call graph construction in object-oriented languages". ACM SIGPLAN Notices. ACM. 32 (10): 108, 108–124, 124. doi: 10.1145/263700.264352 .
  6. Eisenbarth, T.; Koschke, R.; Simon, D. (2001). "Aiding program comprehension by static and dynamic feature analysis". Proceedings IEEE International Conference on Software Maintenance. ICSM 2001. pp. 602–611. doi:10.1109/icsm.2001.972777. ISBN   0-7695-1189-9. S2CID   5934718.
  7. Gao, Debin; Reiter, Michael K.; Song, Dawn (25 October 2004). "Gray-box extraction of execution graphs for anomaly detection". Proceedings of the 11th ACM conference on Computer and communications security - CCS '04. ACM. pp. 318–329. doi:10.1145/1030083.1030126. ISBN   1581139616. S2CID   1189805.