DWARF

Last updated

DWARF is a widely used, standardized debugging data format. DWARF was originally designed along with Executable and Linkable Format (ELF), although it is independent of object file formats. [1] The name is a medieval fantasy complement to "ELF" that had no official meaning, although the name "Debugging With Arbitrary Record Formats" has since been proposed as a backronym. [1]

Contents

DWARF originated with the C compiler and sdb debugger in Unix System V Release 4 (SVR4). [1]

History

The first version of DWARF proved to use excessive amounts of storage, and an incompatible successor, DWARF-2, superseded it and added various encoding schemes to reduce data size. DWARF did not immediately gain universal acceptance; for instance, when Sun Microsystems adopted ELF as part of their move to Solaris, they opted to continue using stabs, in an embedding known as "stabs-in-elf". Linux followed suit, and DWARF-2 did not become the default until the late 1990s.

The DWARF Workgroup of the Free Standards Group released DWARF version 3 in January 2006, [2] adding (among other things) support for C++ namespaces, Fortran 90 allocatable data and additional compiler optimization techniques.

The DWARF committee published version 4 of DWARF, which offers "improved data compression, better description of optimized code, and support for new language features in C++", in 2010. [3]

Version 5 of the DWARF format was published in February 2017. [4] [5] It "incorporates improvements in many areas: better data compression, separation of debugging data from executable files, improved description of macros and source files, faster searching for symbols, improved debugging of optimized code, as well as numerous improvements in functionality and performance."

Structure

DWARF uses a data structure called a Debugging Information Entry (DIE) to represent each variable, type, procedure, etc. A DIE has a tag (e.g., DW_TAG_variable, DW_TAG_pointer_type, DW_TAG_subprogram) and attributes (key-value pairs). A DIE can have nested (child) DIEs, forming a tree structure. A DIE attribute can refer to another DIE anywhere in the tree—for instance, a DIE representing a variable would have a DW_AT_type entry pointing to the DIE describing the variable's type.

To save space, two large tables needed by symbolic debuggers are represented as byte-coded instructions for simple, special-purpose finite state machines. The Line Number Table, which maps code locations to source code locations and vice versa, also specifies which instructions are part of function prologues and epilogues. The Call Frame Information table allows debuggers to locate frames on the call stack.

DWARF has been divided into different sections like .debug_info, [6] .debug_frame, etc.

Tools

Libdwarf is a library that provides access to the DWARF debugging information in executable files and object files. [7]

Further reading

Michael Eager, chair of the DWARF Standards Committee, has written an introduction to debugging formats and DWARF 3, Introduction to the DWARF Debugging Format. [1]

Related Research Articles

<span class="mw-page-title-main">Common Lisp</span> Programming language standard

Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ANSI INCITS 226-1994 (S20018). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived from the ANSI Common Lisp standard.

<span class="mw-page-title-main">Executable and Linkable Format</span> Standard file format for executables, object code, shared libraries, and core dumps.

In computing, the Executable and Linkable Format, is a common standard file format for executable files, object code, shared libraries, and core dumps. First published in the specification for the application binary interface (ABI) of the Unix operating system version named System V Release 4 (SVR4), and later in the Tool Interface Standard, it was quickly accepted among different vendors of Unix systems. In 1999, it was chosen as the standard binary file format for Unix and Unix-like systems on x86 processors by the 86open project.

<span class="mw-page-title-main">GNU Debugger</span> Source-level debugger

The GNU Debugger (GDB) is a portable debugger that runs on many Unix-like systems and works for many programming languages, including Ada, Assembly, C, C++, D, Fortran, Haskell, Go, Objective-C, OpenCL C, Modula-2, Pascal, Rust, and partially others.

<span class="mw-page-title-main">PNG</span> Family of lossless compression file formats for image files

Portable Network Graphics is a raster-graphics file format that supports lossless data compression. PNG was developed as an improved, non-patented replacement for Graphics Interchange Format (GIF)—unofficially, the initials PNG stood for the recursive acronym "PNG's not GIF".

zlib DEFLATE codec library

zlib is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a crucial component of many software platforms, including Linux, macOS, and iOS. It has also been used in gaming consoles such as the PlayStation 4, PlayStation 3, Wii U, Wii, Xbox One and Xbox 360.

ScriptBasic is a scripting language variant of BASIC. The source of the interpreter is available as a C program under the LGPL license.

The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems, and in UEFI environments. The PE format is a data structure that encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code. This includes dynamic library references for linking, API export and import tables, resource management data and thread-local storage (TLS) data. On NT operating systems, the PE format is used for EXE, DLL, SYS, MUI and other file types. The Unified Extensible Firmware Interface (UEFI) specification states that PE is the standard executable format in EFI environments.

OpenEXR is a high-dynamic range, multi-channel raster file format, released as an open standard along with a set of software tools created by Industrial Light & Magic (ILM), under a free software license similar to the BSD license.

<span class="mw-page-title-main">Executable</span> A file that causes a computer to follow indicated instructions

In computing, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a data file that must be interpreted (parsed) by a program to be meaningful.

The Common Object File Format (COFF) is a format for executable, object code, and shared library computer files used on Unix systems. It was introduced in Unix System V, replaced the previously used a.out format, and formed the basis for extended specifications such as XCOFF and ECOFF, before being largely replaced by ELF, introduced with SVR4. COFF and its variants continue to be used on some Unix-like systems, on Microsoft Windows, in UEFI environments and in some embedded development systems.

An object file is a compiled file that contains machine code or bytecode, as well as other data and metadata, generated from source code during the compilation process. An object file is also generated from an assembler. The machine code that is generated is known as object code.

In computer programming, a magic number is any of the following:

<span class="mw-page-title-main">Free Pascal</span> Free compiler and IDE for Pascal and ObjectPascal

Free Pascal Compiler (FPC) is a compiler for the closely related programming-language dialects Pascal and Object Pascal. It is free software released under the GNU General Public License, with exception clauses that allow static linking against its runtime libraries and packages for any purpose in combination with any other software license.

Snappy is a fast data compression and decompression library written in C++ by Google based on ideas from LZ77 and open-sourced in 2011. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. Compression speed is 250 MB/s and decompression speed is 500 MB/s using a single core of a circa 2011 "Westmere" 2.26 GHz Core i7 processor running in 64-bit mode. The compression ratio is 20–100% lower than gzip.

a.out is a file format used in older versions of Unix-like computer operating systems for executables, object code, and, in later systems, shared libraries. This is an abbreviated form of "assembler output", the filename of the output of Ken Thompson's PDP-7 assembler. The term was subsequently applied to the format of the resulting file to contrast with other formats for object code.

Minification is the process of removing all unnecessary characters from the source code of interpreted programming languages or markup languages without changing its functionality. These unnecessary characters usually include white space characters, new line characters, comments, and sometimes block delimiters, which are used to add readability to the code but are not required for it to execute. Minification reduces the size of the source code, making its transmission over a network more efficient. In programmer culture, aiming at extremely minified source code is the purpose of recreational code golf competitions.

Thrift is an interface definition language and binary communication protocol used for defining and creating services for numerous programming languages. It was developed at Facebook for "scalable cross-language services development" and as of 2020 is an open source project in the Apache Software Foundation.

A debugging data format is a means of storing information about a compiled computer program for use by high-level debuggers. Modern debugging data formats store enough information to allow source-level debugging.

Intel Fortran Compiler, as part of Intel OneAPI HPC toolkit, is a group of Fortran compilers from Intel for Windows, macOS, and Linux.

LEB128 or Little Endian Base 128 is a variable-length code compression used to store arbitrarily large integers in a small number of bytes. LEB128 is used in the DWARF debug file format and the WebAssembly binary encoding for all integer literals.

References

  1. 1 2 3 4 Michael J. Eager (April 2012). "Introduction to the DWARF Debugging Format" (PDF). Retrieved 2015-01-08.
  2. "DWARF Version 3 Standard Released" (Press release). Free Standards Group. January 4, 2006. Retrieved 2007-06-25.
  3. "DWARF Version 4 Released". The DWARF committee. June 16, 2010. Retrieved 2010-06-24.
  4. "DWARF Version 5 Standard Released". The DWARF committee. February 15, 2017. Retrieved 2017-08-07.
  5. "DWARF 5 Standard". The DWARF committee. February 15, 2017. Retrieved 2017-08-07.
  6. .debug_info ibm documenation
  7. "libdwarf: A Consumer Library Interface to DWARF". www.prevanders.net. Retrieved 2023-12-06.