Debug symbol

Last updated

A debug symbol is a special kind of symbol that attaches additional information to the symbol table of an object file, such as a shared library or an executable. This information allows a symbolic debugger to gain access to information from the source code of the binary, such as the names of identifiers, including variables and routines.

Contents

The symbolic information may be compiled together with the module's binary file, or distributed in a separate file, or simply discarded during the compilation and/or linking.

This information can be helpful while trying to investigate and fix a crashing application or any other fault. [1]

Debugging information

Debug symbols typically include not only the name of a function or global variable, but also the name of the source code file in which the symbol occurs, as well as the line number at which it is defined. Other information includes the type of the symbol (integer, float, function, exception, etc.), the scope (block scope or global scope), the size, and, for classes, the name of the class, and the methods and members in it.

Part of the debug information includes ithe line of code in the source file which defines that symbol (a function or global variable), as well as symbols associated with exception frames.

This information may be stored in the symbol table of an object file, executable file, or shared library, or may be in a separate file.

On some systems, e.g., z/OS, the debug information contains more than just the symbol tabled, e.g., the ADATA discussed in § OS/390 et al contains source code.

Debugging information can take up quite a bit of space, especially the filenames and line numbers. Thus, binaries with debug symbols can become quite large, often several times the stripped file size. [2] To avoid this extra size, most operating system distributions ship binaries that are stripped, i.e. from which all of the debugging symbols have been removed. This is accomplished, for example, with the strip command in Unix. If the debugging information is in separate files, those files are usually not shipped with the distribution.

Embedded symbols

Unix-like systems

stabs was an early format for debugging symbols on Unix-like systems. The newer DWARF format, for which formal specifications exist, has largely supplanted it. The specification allows any compatible compiler or assembler to create debug symbols in a standardized format, and for any debugger, such as the GNU Debugger (GDB), to gain access and display these symbols.

IBM

The compilers for the IBM mainframe line descended from the System/360 have a TEST option that causes the compiler to include debugging information [3] [4] [5] in the object file. Similarly, the linkage editors hsve a TEST option that causes the debug information to be retained [6] in the load module. Various debug tools, e.g., OS/360 TESTRAN, TSO TEST, have the ability to use the embedded symbol definitions.

External debug files

OS/390 et al

The IBM High Level Assembler (HLASM) and other compilers running on, e.g., z/OS, have an ADATA option that produces an Associated data (ADATA) file [7] containing more information than that produced by the old TEST option. In particular, the ADATA file includes lines of source code and their metadata.

Microsoft debug symbols

Microsoft compilers generate a program database (PDB) file containing debug symbols. Some companies ship the PDB on their CD/DVD to enable troubleshooting and other companies (like Microsoft, and the Mozilla Corporation) allow downloading debug symbols from the Internet. The WinDbg debugger and the Visual Studio IDE can be configured to automatically download debug symbols for Windows dynamic-link libraries (DLLs) on demand. The PDB debug symbols that Microsoft distributes include only public functions, global variables and their data types. The Mozilla Corporation has similar infrastructure but distributes full debug information.

Apple

On Apple platforms, debug symbols are optionally emitted during the build process as dSYM files. Apple uses the term "symbolicate" to refer to the replacement of addresses in diagnostic files with human readable values. [8]

History

Symbolic debuggers have existed since the mainframe era, almost since the first introduction of suitable computer displays on which to display the symbolic debugging information (and even earlier with symbolic dumps on paper). They were not restricted to high level compiled languages and were available also for assembly language programs. For the IBM/360, these produced object code (on request) that included "SYM cards". These were usually ignored by the program loader but were useful to a symbolic debugger as they were kept on the same program library as the executable logic code.

See also

Related Research Articles

<span class="mw-page-title-main">Assembly language</span> Low-level programming language

In computer programming, assembly language, often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence between the instructions in the language and the architecture's machine code instructions. Assembly language usually has one statement per machine instruction (1:1), but constants, comments, assembler directives, symbolic labels of, e.g., memory locations, registers, and macros are generally also supported.

<span class="mw-page-title-main">Executable and Linkable Format</span> Standard file format for executables, object code, shared libraries, and core dumps.

In computing, the Executable and Linkable Format is a common standard file format for executable files, object code, shared libraries, and core dumps. First published in the specification for the application binary interface (ABI) of the Unix operating system version named System V Release 4 (SVR4), and later in the Tool Interface Standard, it was quickly accepted among different vendors of Unix systems. In 1999, it was chosen as the standard binary file format for Unix and Unix-like systems on x86 processors by the 86open project.

<span class="mw-page-title-main">Linker (computing)</span> Computer program which combines multiple object files into a single file

In computing, a linker or link editor is a computer system program that takes one or more object files and combines them into a single executable file, library file, or another "object" file.

<span class="mw-page-title-main">Machine code</span> Set of instructions executed by a computer

In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers machine code is "the binary representation of a computer program which is actually read and interpreted by the computer. A program in machine code consists of a sequence of machine instructions ."

PL/I is a procedural, imperative computer programming language initially developed by IBM. It is designed for scientific, engineering, business and system programming. It has been in continuous use by academic, commercial and industrial organizations since it was introduced in the 1960s.

In computing, a core dump, memory dump, crash dump, storage dump, system dump, or ABEND dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has crashed or otherwise terminated abnormally. In practice, other key pieces of program state are usually dumped at the same time, including the processor registers, which may include the program counter and stack pointer, memory management information, and other processor and operating system flags and information. A snapshot dump is a memory dump requested by the computer operator or by the running program, after which the program is able to continue. Core dumps are often used to assist in diagnosing and debugging errors in computer programs.

<span class="mw-page-title-main">Library (computing)</span> Collection of resources used to develop a computer program

In computer science, a library is a collection of read-only resources that is leveraged during software development to implement a computer program.

.doc is a filename extension used for word processing documents stored on Microsoft's proprietary Microsoft Word Binary File Format; it was the primary format for Microsoft Word until the 2007 version replaced it with Office Open XML .docx files. Microsoft has used the extension since 1983.

The Common Object File Format (COFF) is a format for executable, object code, and shared library computer files used on Unix systems. It was introduced in Unix System V, replaced the previously used a.out format, and formed the basis for extended specifications such as XCOFF and ECOFF, before being largely replaced by ELF, introduced with SVR4. COFF and its variants continue to be used on some Unix-like systems, on Microsoft Windows, in UEFI environments and in some embedded development systems.

An object file is a file that contains machine code or bytecode, as well as other data and metadata, generated by a compiler or assembler from source code during the compilation or assembly process. The machine code that is generated is known as object code.

Adabas, a contraction of “adaptable database system," is a database package that was developed by Software AG to run on IBM mainframes. It was launched in 1971 as a non-relational database. As of 2019, Adabas is marketed for use on a wider range of platforms, including Linux, Unix, and Windows.

dbx is a source-level debugger found primarily on Solaris, AIX, IRIX, Tru64 UNIX, Linux and BSD operating systems. It provides symbolic debugging for programs written in C, C++, Fortran, Pascal and Java. Useful features include stepping through programs one source line or machine instruction at a time. In addition to simply viewing operation of the program, variables can be manipulated and a wide range of expressions can be evaluated and displayed.

CodeView is a standalone debugger created by David Norris at Microsoft in 1985 as part of its development toolset. It originally shipped with Microsoft C 4.0 and later. It also shipped with Visual Basic for MS-DOS, Microsoft BASIC PDS, and a number of other Microsoft language products. It was one of the first debuggers for MS-DOS to be full-screen oriented, rather than line-oriented.

DWARF is a widely used, standardized debugging data format. DWARF was originally designed along with Executable and Linkable Format (ELF), although it is independent of object file formats. The name is a medieval fantasy complement to "ELF" that had no official meaning, although the name "Debugging With Arbitrary Record Formats" has since been proposed as a backronym.

WinDbg is a multipurpose debugger for the Microsoft Windows computer operating system, distributed by Microsoft. Debugging is the process of finding and resolving errors in a system; in computing it also includes exploring the internal operation of software as a help to development. It can be used to debug user mode applications, device drivers, and the operating system itself in kernel mode.

In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed, by copying the content of libraries from persistent storage to RAM, filling jump tables and relocating pointers. The specific operating system and executable format determine how the dynamic linker functions and how it is implemented.

Program database (PDB) is a file format for storing debugging information about a program. PDB files commonly have a .pdb extension. A PDB file is typically created from source files during compilation. It stores a list of all symbols in a module with their addresses and possibly the name of the file and the line on which the symbol was declared. This symbol information is not stored in the module itself, because it takes up a lot of space.

Plus is a "Pascal-like" system implementation language from the University of British Columbia (UBC), Canada, based on the SUE system language developed at the University of Toronto, c. 1971.

The OS/360 Object File Format is the standard object module file format for the IBM DOS/360, OS/360 and VM/370, Univac VS/9, and Fujitsu BS2000 mainframe operating systems. In the 1990s, the format was given an extension with the XSD-type record for the MVS Operating System to support longer module names in the C Programming Language. This format is still in use by the z/VSE operating system. In contrast, it has been superseded by the GOFF file format on the MVS Operating System and on the z/VM Operating System. Since the MVS and z/VM loaders will still handle this older format, some compilers have chosen to continue to produce this format instead of the newer GOFF format.

The GOFF specification was developed for IBM's MVS operating system to supersede the IBM OS/360 Object File Format to compensate for weaknesses in the older format.

References

  1. "Debugging with Symbols". Windows Dev Center. Microsoft. Archived from the original on 2020-01-11. Retrieved 2020-01-11.
  2. "What are Symbols For?". TechNet . Microsoft. 2008-07-15. Archived from the original on 2014-12-26. Retrieved 2015-01-04.
  3. "Appendix D: TESTRAN Editor Input Record Formats" (PDF). IBM System/360 Operating System - TESTRAN - Program Logic Manual - Program Number 3605-PT-516 (PDF). TNL GN26-8016. IBM. 1971-04-01. pp. 119–120. GY28-6611-0. Retrieved 2024-07-11.
  4. "Appendix. Input conventions and Record Formats" (PDF). MVS/370 - Linkage Editor Logic - Data Facility Product 5665-295 - Release 1.0 (PDF) (First ed.). IBM. April 1983. pp. 195–206. LY26-3921-0. Retrieved 2024-07-11.
  5. LY26-3921-0, p. 195, Figure 69. SYM Input Record (Card Image).
  6. LY26-3921-0, p. 199, Figure 76. SYM Record (Load Module).
  7. "Appendix C. Associated data file output" (PDF). High Level Assembler for z/OS & z/VM & z/VSE - Programmer's Guide - Version 1 Release 6 (PDF). IBM. 2015. pp. 227–275. SC26-4941-07. Retrieved 2024-07-11.
  8. "Understanding and Analyzing iOS Application Crash Reports". iOS Developer Library. Apple, Inc. 2018-01-08 [2009-01-29]. Technical Note TN2151. Archived from the original on 2019-12-19. Retrieved 2020-01-11.