Debug symbol

Last updated

A debug symbol is a special kind of symbol that attaches additional information to the symbol table of an object file, such as a shared library or an executable. This information allows a symbolic debugger to gain access to information from the source code of the binary, such as the names of identifiers, including variables and routines.

Contents

The symbolic information may be compiled together with the module's binary file, or distributed in a separate file, or simply discarded during the compilation and/or linking.

This information can be helpful while trying to investigate and fix a crashing application or any other fault. [1]

Embedded symbols

Debug symbols typically include not only the name of a function or global variable, but also the name of the source code file in which the symbol occurs, as well as the line number at which it is defined. Other information includes the type of the symbol (integer, float, function, exception, etc.), the scope (block scope or global scope), the size, and, for classes, the name of the class, and the methods and members in it. All of this additional information can take up quite a bit of space, especially the filenames and line numbers. Thus, binaries with debug symbols can become quite large, often several times the stripped file size. [2] To avoid this extra size, most operating system distributions ship binaries that are stripped, i.e. from which all of the debugging symbols have been removed. This is accomplished, for example, with the strip command in Unix.

Some compilers will output the symbolic debugging information into a separate file, rather than placing it together with the binary.

SysV ABI

The SysV application binary interface (ABI) includes a specification for the format of debug symbols. This allows any compatible compiler or assembler to create debug symbols in a standardized format, and for any debugger, such as the GNU Debugger (GDB), to gain access and display these symbols. For example, part of the important debug information includes the line of code in the source file which defines that symbol (a function or global variable), as well as symbols associated with exception frames.

Microsoft debug symbols

Microsoft compilers generate a program database (PDB) file containing debug symbols. Some companies ship the PDB on their CD/DVD to enable troubleshooting and other companies (like Microsoft, and the Mozilla Corporation) allow downloading debug symbols from the Internet. The WinDbg debugger and the Visual Studio IDE can be configured to automatically download debug symbols for Windows dynamic-link libraries (DLLs) on demand. The PDB debug symbols that Microsoft distributes include only public functions, global variables and their data types. The Mozilla Corporation has similar infrastructure but distributes full debug information.

Both Microsoft and Mozilla also offer the source code (Microsoft provides certain components, such as most of the .NET Framework, whereas Mozilla offers full source) to make debugging easier.

Apple

On Apple platforms, debug symbols are optionally emitted during the build process as dSYM files. Apple uses the term "symbolicate" to refer to the replacement of addresses in diagnostic files with human readable values. [3]


History

Symbolic debuggers have existed since the mainframe era, almost since the first introduction of suitable computer displays on which to display the symbolic debugging information (and even earlier with symbolic dumps on paper). They were not restricted to high level compiled languages and were available also for assembly language programs. For the IBM/360, these produced object code (on request) that included "SYM cards". These were usually ignored by the program loader but were useful to a symbolic debugger as they were kept on the same program library as the executable logic code.

See also

Related Research Articles

<span class="mw-page-title-main">Common Lisp</span> Programming language standard

Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ANSI INCITS 226-1994 (S20018). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived from the ANSI Common Lisp standard.

<span class="mw-page-title-main">JavaScript</span> High-level programming language

JavaScript, often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. As of 2023, 98.7% of websites use JavaScript on the client side for webpage behavior, often incorporating third-party libraries. All major web browsers have a dedicated JavaScript engine to execute the code on users' devices.

<span class="mw-page-title-main">GNU Debugger</span> Source-level debugger

The GNU Debugger (GDB) is a portable debugger that runs on many Unix-like systems and works for many programming languages, including Ada, Assembly, C, C++, D, Fortran, Haskell, Go, Objective-C, OpenCL C, Modula-2, Pascal, Rust, and partially others.

VBScript is a deprecated Active Scripting language developed by Microsoft that is modeled on Visual Basic. It allows Microsoft Windows system administrators to generate powerful tools for managing computers without error handling and with subroutines and other advanced programming constructs. It can give the user complete control over many aspects of their computing environment.

A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. Disassembly, the output of a disassembler, is often formatted for human-readability rather than suitability for input to an assembler, making it principally a reverse-engineering tool. Common uses of disassemblers include analyzing high-level programing language compilers output and their optimizations, recovering source code of a program whose original source was lost, malware analysis, modifying software, and software cracking.

ScriptBasic is a scripting language variant of BASIC. The source of the interpreter is available as a C program under the LGPL license.

<span class="mw-page-title-main">D (programming language)</span> Multi-paradigm system programming language

D, also known as dlang, is a multi-paradigm system programming language created by Walter Bright at Digital Mars and released in 2001. Andrei Alexandrescu joined the design and development effort in 2007. Though it originated as a re-engineering of C++, D is a profoundly different language — features of D can be considered streamlined and expanded-upon ideas from C++, however D also draws inspiration from other high-level programming languages, notably Java, Python, Ruby, C#, and Eiffel.

The Common Object File Format (COFF) is a format for executable, object code, and shared library computer files used on Unix systems. It was introduced in Unix System V, replaced the previously used a.out format, and formed the basis for extended specifications such as XCOFF and ECOFF, before being largely replaced by ELF, introduced with SVR4. COFF and its variants continue to be used on some Unix-like systems, on Microsoft Windows, in UEFI environments and in some embedded development systems.

An object file is a compiled file that contains machine code or bytecode, as well as other data and metadata, generated from source code during the compilation process. An object file is also generated from an assembler. The machine code that is generated is known as object code.

In computer programming, a magic number is any of the following:

CodeView is a standalone debugger created by David Norris at Microsoft in 1985 as part of its development toolset. It originally shipped with Microsoft C 4.0 and later. It also shipped with Visual Basic for MS-DOS, Microsoft BASIC PDS, and a number of other Microsoft language products. It was one of the first debuggers for MS-DOS to be full-screen oriented, rather than line-oriented.

In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier, constant, procedure and function in a program's source code is associated with information relating to its declaration or appearance in the source. In other words, the entries of a symbol table store the information related to the entry's corresponding symbol.

DWARF is a widely used, standardized debugging data format. DWARF was originally designed along with Executable and Linkable Format (ELF), although it is independent of object file formats. The name is a medieval fantasy complement to "ELF" that had no official meaning, although the name "Debugging With Arbitrary Record Formats" has since been proposed as a backronym.

WinDbg is a multipurpose debugger for the Microsoft Windows computer operating system, distributed by Microsoft. Debugging is the process of finding and resolving errors in a system; in computing it also includes exploring the internal operation of software as a help to development. It can be used to debug user mode applications, device drivers, and the operating system itself in kernel mode.

<span class="mw-page-title-main">Crash reporter</span> System software that identify and report crash details

A crash reporter is usually a system software whose function is to identify reporting crash details and to alert when there are crashes, in production or on development / testing environments. Crash reports often include data such as stack traces, type of crash, trends and version of software. These reports help software developers- Web, SAAS, mobile apps and more, to diagnose and fix the underlying problem causing the crashes. Crash reports may contain sensitive information such as passwords, email addresses, and contact information, and so have become objects of interest for researchers in the field of computer security.

In Unix, Plan 9, and Unix-like operating systems, the strip program removes information from executable binary programs and object files that is not essential or required for normal and correct execution, thus potentially resulting in better performance and sometimes significantly less disk space usage. The resulting file is a stripped binary.

Program database (PDB) is a file format for storing debugging information about a program. PDB files commonly have a .pdb extension. A PDB file is typically created from source files during compilation. It stores a list of all symbols in a module with their addresses and possibly the name of the file and the line on which the symbol was declared. This symbol information is not stored in the module itself, because it takes up a lot of space.

Component Object Model (COM) is a binary-interface standard for software components introduced by Microsoft in 1993. It is used to enable inter-process communication object creation in a large range of programming languages. COM is the basis for several other Microsoft technologies and frameworks, including OLE, OLE Automation, Browser Helper Object, ActiveX, COM+, DCOM, the Windows shell, DirectX, UMDF and Windows Runtime. The essence of COM is a language-neutral way of implementing objects that can be used in environments different from the one in which they were created, even across machine boundaries. For well-authored components, COM allows reuse of objects with no knowledge of their internal implementation, as it forces component implementers to provide well-defined interfaces that are separated from the implementation. The different allocation semantics of languages are accommodated by making objects responsible for their own creation and destruction through reference-counting. Type conversion casting between different interfaces of an object is achieved through the QueryInterface method. The preferred method of "inheritance" within COM is the creation of sub-objects to which method "calls" are delegated.

Web2py is an open-source web application framework written in the Python programming language. Web2py allows web developers to program dynamic web content using Python. Web2py is designed to help reduce tedious web development tasks, such as developing web forms from scratch, although a web developer may build a form from scratch if required.

The OS/360 Object File Format is the standard object module file format for the IBM DOS/360, OS/360 and VM/370, Univac VS/9, and Fujitsu BS2000 mainframe operating systems. In the 1990s, the format was given an extension with the XSD-type record for the MVS Operating System to support longer module names in the C Programming Language. This format is still in use by the z/VSE operating system. In contrast, it has been superseded by the GOFF file format on the MVS Operating System and on the z/VM Operating System. Since the MVS and z/VM loaders will still handle this older format, some compilers have chosen to continue to produce this format instead of the newer GOFF format.

References

  1. "Debugging with Symbols". Windows Dev Center. Microsoft. Archived from the original on 2020-01-11. Retrieved 2020-01-11.
  2. "What are Symbols For?". TechNet . Microsoft. 2008-07-15. Archived from the original on 2014-12-26. Retrieved 2015-01-04.
  3. "Understanding and Analyzing iOS Application Crash Reports". iOS Developer Library. Apple, Inc. 2018-01-08 [2009-01-29]. Technical Note TN2151. Archived from the original on 2019-12-19. Retrieved 2020-01-11.