DWARF

Last updated

DWARF is a widely used, standardized debugging data format. DWARF was originally designed along with Executable and Linkable Format (ELF), although it is independent of object file formats. [1] The name is a medieval fantasy complement to "ELF" that had no official meaning, although the name "Debugging With Arbitrary Record Formats" has since been proposed as a backronym. [1]

Contents

DWARF originated with the C compiler and sdb debugger in Unix System V Release 4 (SVR4). [1]

History

The first version of DWARF proved to use excessive amounts of storage, and an incompatible successor, DWARF-2, superseded it and added various encoding schemes to reduce data size. DWARF did not immediately gain universal acceptance; for instance, when Sun Microsystems adopted ELF as part of their move to Solaris, they opted to continue using stabs, in an embedding known as "stabs-in-elf". Linux followed suit, and DWARF-2 did not become the default until the late 1990s.

The DWARF Workgroup of the Free Standards Group released DWARF version 3 in January 2006, [2] adding (among other things) support for C++ namespaces, Fortran 90 allocatable data and additional compiler optimization techniques.

The DWARF committee published version 4 of DWARF, which offers "improved data compression, better description of optimized code, and support for new language features in C++", in 2010. [3]

Version 5 of the DWARF format was published in February 2017. [4] [5] It "incorporates improvements in many areas: better data compression, separation of debugging data from executable files, improved description of macros and source files, faster searching for symbols, improved debugging of optimized code, as well as numerous improvements in functionality and performance."

Structure

DWARF uses a data structure called a Debugging Information Entry (DIE) to represent each variable, type, procedure, etc. A DIE has a tag (e.g., DW_TAG_variable, DW_TAG_pointer_type, DW_TAG_subprogram) and attributes (key-value pairs). A DIE can have nested (child) DIEs, forming a tree structure. A DIE attribute can refer to another DIE anywhere in the tree—for instance, a DIE representing a variable would have a DW_AT_type entry pointing to the DIE describing the variable's type.

To save space, two large tables needed by symbolic debuggers are represented as byte-coded instructions for simple, special-purpose finite-state machines. The Line Number Table, which maps code locations to source code locations and vice versa, also specifies which instructions are part of function prologues and epilogues. The Call Frame Information table allows debuggers to locate frames on the call stack.

DWARF has been divided into different sections like .debug_info, [6] .debug_frame, etc.

Tools

Libdwarf is a library that provides access to the DWARF debugging information in executable files and object files. [7]

Further reading

Michael Eager, chair of the DWARF Standards Committee, has written an introduction to debugging formats and DWARF 3, Introduction to the DWARF Debugging Format. [1]

Related Research Articles

<span class="mw-page-title-main">Common Lisp</span> Programming language standard

Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ANSI INCITS 226-1994 (S2018). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived from the ANSI Common Lisp standard.

<span class="mw-page-title-main">Executable and Linkable Format</span> Standard file format for executables, object code, shared libraries, and core dumps.

In computing, the Executable and Linkable Format is a common standard file format for executable files, object code, shared libraries, and core dumps. First published in the specification for the application binary interface (ABI) of the Unix operating system version named System V Release 4 (SVR4), and later in the Tool Interface Standard, it was quickly accepted among different vendors of Unix systems. In 1999, it was chosen as the standard binary file format for Unix and Unix-like systems on x86 processors by the 86open project.

<span class="mw-page-title-main">GNU Debugger</span> Source-level debugger

The GNU Debugger (GDB) is a portable debugger that runs on many Unix-like systems and works for many programming languages, including Ada, Assembly, C, C++, D, Fortran, Haskell, Go, Objective-C, OpenCL C, Modula-2, Pascal, Rust, and partially others.

<span class="mw-page-title-main">Machine code</span> Lowest level instructions executed by a computer

In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binary representation of a computer program which is actually read and interpreted by the computer. A program in machine code consists of a sequence of machine instructions.

<span class="mw-page-title-main">PNG</span> Family of lossless-compression image file formats

Portable Network Graphics is a raster-graphics file format that supports lossless data compression. PNG was developed as an improved, non-patented replacement for Graphics Interchange Format (GIF)—unofficially, the initials PNG stood for the recursive acronym "PNG's not GIF".

zlib DEFLATE codec library

zlib is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a crucial component of many software platforms, including Linux, macOS, and iOS. It has also been used in gaming consoles such as the PlayStation 4, PlayStation 3, Wii U, Wii, Xbox One and Xbox 360.

The Portable Executable (PE) format is a file format for executables, object code, dynamic-link-libraries (DLLs), and binary files used on 32-bit and 64-bit Windows operating systems, as well as in UEFI environments. It is the standard format for executables on Windows NT-based systems, including files such as .exe, .dll, .sys, and .mui. At its core, the PE format is a structured data container that gives the Windows operating system loader eveything it needs to properly manage the executable code it contains. This includes references for dynamically linked libraries, tables for importing and exporting APIs, resource management data and thread-local storage (TLS) information.

Tag Image File Format or Tagged Image File Format, commonly known by the abbreviations TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by scanning, faxing, word processing, optical character recognition, image manipulation, desktop publishing, and page-layout applications. The format was created by the Aldus Corporation for use in desktop publishing. It published the latest version 6.0 in 1992, subsequently updated with an Adobe Systems copyright after the latter acquired Aldus in 1994. Several Aldus or Adobe technical notes have been published with minor extensions to the format, and several specifications have been based on TIFF 6.0, including TIFF/EP, TIFF/IT, TIFF-F and TIFF-FX.

OpenEXR is a high-dynamic range, multi-channel raster file format, released as an open standard along with a set of software tools created by Industrial Light & Magic (ILM), under a free software license similar to the BSD license.

<span class="mw-page-title-main">Executable</span> A file that causes a computer to follow indicated instructions

In computer science, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a data file that must be interpreted (parsed) by an interpreter to be functional.

The Common Object File Format (COFF) is a format for executable, object code, and shared library computer files used on Unix systems. It was introduced in Unix System V, replaced the previously used a.out format, and formed the basis for extended specifications such as XCOFF and ECOFF, before being largely replaced by ELF, introduced with SVR4. COFF and its variants continue to be used on some Unix-like systems, on Microsoft Windows, in UEFI environments and in some embedded development systems.

An object file is a file that contains machine code or bytecode, as well as other data and metadata, generated by a compiler or assembler from source code during the compilation or assembly process. The machine code that is generated is known as object code.

In computer programming, a magic number is any of the following:

a.out is a file format used in older versions of Unix-like computer operating systems for executables, object code, and, in later systems, shared libraries. This is an abbreviated form of "assembler output", the filename of the output of Ken Thompson's PDP-7 assembler. The term was subsequently applied to the format of the resulting file to contrast with other formats for object code.

Minification is the process of removing all unnecessary characters from the source code of interpreted programming languages or markup languages without changing its functionality. These unnecessary characters usually include whitespace characters, new line characters, comments, and sometimes block delimiters, which are used to add readability to the code but are not required for it to execute. Minification reduces the size of the source code, making its transmission over a network more efficient. In programmer culture, aiming at extremely minified source code is the purpose of recreational code golf competitions.

A debug symbol is a special kind of symbol that attaches additional information to the symbol table of an object file, such as a shared library or an executable. This information allows a symbolic debugger to gain access to information from the source code of the binary, such as the names of identifiers, including variables and routines.

A debugging data format is a means of storing information about a compiled computer program for use by high-level debuggers. Modern debugging data formats store enough information to allow source-level debugging.

Intel Fortran Compiler, as part of Intel OneAPI HPC toolkit, is a group of Fortran compilers from Intel for Windows, macOS, and Linux.

LEB128 or Little Endian Base 128 is a variable-length code compression used to store arbitrarily large integers in a small number of bytes. LEB128 is used in the DWARF debug file format and the WebAssembly binary encoding for all integer literals.

References

  1. 1 2 3 4 Michael J. Eager (April 2012). "Introduction to the DWARF Debugging Format" (PDF). Retrieved 2015-01-08.
  2. "DWARF Version 3 Standard Released" (Press release). Free Standards Group. January 4, 2006. Archived from the original on 2011-07-25. Retrieved 2007-06-25.
  3. "DWARF Version 4 Released". The DWARF committee. June 16, 2010. Archived from the original on 2020-07-30. Retrieved 2010-06-24.
  4. "DWARF Version 5 Standard Released". The DWARF committee. February 15, 2017. Retrieved 2017-08-07.
  5. "DWARF 5 Standard". The DWARF committee. February 15, 2017. Retrieved 2017-08-07.
  6. .debug_info ibm documenation
  7. "libdwarf: A Consumer Library Interface to DWARF". www.prevanders.net. Retrieved 2023-12-06.