COFF

Last updated
COFF
Filename extension
none, .o, .obj, .lib [1]
Internet media type application/x-coff, application/x-coffexec
Magic number
Developed by AT&T Corporation
Type of format Binary, executable, object, shared libraries
Extended to XCOFF, ECOFF, Portable Executable, Executable and Linkable Format

The Common Object File Format (COFF) is a format for executable, object code, and shared library computer files used on Unix systems. It was introduced in Unix System V, replaced the previously used a.out format, and formed the basis for extended specifications such as XCOFF and ECOFF, before being largely replaced by ELF, introduced with SVR4. COFF and its variants continue to be used on some Unix-like systems, on Microsoft Windows (Portable Executable), in UEFI environments and in some embedded development systems.

Contents

History

The original Unix object file format a.out is unable to adequately support shared libraries, foreign format identification,[ citation needed ] or explicit address linkage.[ citation needed ] As development of Unix-like systems continued both inside and outside AT&T, different solutions to these and other issues emerged.

COFF was introduced in 1983, in AT&T's UNIX System V for non-VAX 32-bit platforms such as the 3B20.[ citation needed ] Improvements over the existing AT&T a.out format included arbitrary sections, explicit processor declarations, and explicit address linkage.

However, the COFF design was both too limited and incompletely specified: there was a limit on the maximum number of sections, a limit on the length of section names, included source files, and the symbolic debugging information was incapable of supporting real world languages such as C, much less newer languages like C++, or new processors. All real world implementations of COFF were necessarily violations of the standard as a result. This led to numerous COFF extensions. IBM used the XCOFF format in AIX; DEC, SGI and others used ECOFF; and numerous SysV ports and tool chains targeting embedded development each created their own, incompatible, variations.

With the release of SVR4, AT&T replaced COFF with ELF.

While extended versions of COFF continue to be used for some Unix and Unix-like platforms, primarily in embedded systems, perhaps the most widespread use of the COFF format today is in Microsoft's Portable Executable (PE) format. Developed for Windows NT, the PE format (sometimes written as PE/COFF) uses a COFF header for object files, and as a component of the PE header for executable files. [3]

Features

COFF's main improvement over a.out was the introduction of multiple named sections in the object file. Different object files could have different numbers and types of sections.

Symbolic debugging information

The COFF symbolic debugging information consists of symbolic (string) names for program functions and variables, and line number information, used for setting breakpoints and tracing execution.

Symbolic names are stored in the COFF symbol table. Each symbol table entry includes a name, storage class, type, value and section number. Short names (8 characters or fewer) are stored directly in the symbol table; longer names are stored as an offset into the string table at the end of the COFF object.

Storage classes describe the type entity the symbol represents, and may include external variables (C_EXT), automatic (stack) variables (C_AUTO), register variables (C_REG), functions (C_FCN), and many others. The symbol type describes the interpretation of the symbol entity's value and includes values for all the C data types.

When compiled with appropriate options, a COFF object file will contain line number information for each possible break point in the text section of the object file. Line number information takes two forms: in the first, for each possible break point in the code, the line number table entry records the address and its matching line number. In the second form, the entry identifies a symbol table entry representing the start of a function, enabling a breakpoint to be set using the function's name.

Note that COFF was not capable of representing line numbers or debugging symbols for included source as with header files rendering the COFF debugging information virtually useless without incompatible extensions.

Relative virtual address

When a COFF file is generated, it is not usually known where in memory it will be loaded. The virtual address where the first byte of the file will be loaded is called image base address. The rest of the file is not necessarily loaded in a contiguous block, but in different sections.

Relative virtual addresses (RVAs) are not to be confused with standard virtual addresses. A relative virtual address is the virtual address of an object from the file once it is loaded into memory, minus the base address of the file image. If the file were to be mapped literally from disk to memory, the RVA would be the same as that of the offset into the file, but this is actually quite unusual.

Note that the RVA term is only used with objects in the image file. Once loaded into memory, the image base address is added, and ordinary VAs are used.

Problems

The COFF file header stores the date and time that the object file was created as a 32-bit binary integer, representing the number of seconds since the Unix epoch, 1 January 1970 00:00:00  UTC. Dates occurring after 19 January 2038 cannot be stored in this format, resulting in an instance of the year 2038 problem. [4] :11–4

See also

Related Research Articles

<span class="mw-page-title-main">Executable and Linkable Format</span> Standard file format for executables, object code, shared libraries, and core dumps.

In computing, the Executable and Linkable Format, is a common standard file format for executable files, object code, shared libraries, and core dumps. First published in the specification for the application binary interface (ABI) of the Unix operating system version named System V Release 4 (SVR4), and later in the Tool Interface Standard, it was quickly accepted among different vendors of Unix systems. In 1999, it was chosen as the standard binary file format for Unix and Unix-like systems on x86 processors by the 86open project.

In computing, a core dump, memory dump, crash dump, storage dump, system dump, or ABEND dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has crashed or otherwise terminated abnormally. In practice, other key pieces of program state are usually dumped at the same time, including the processor registers, which may include the program counter and stack pointer, memory management information, and other processor and operating system flags and information. A snapshot dump is a memory dump requested by the computer operator or by the running program, after which the program is able to continue. Core dumps are often used to assist in diagnosing and debugging errors in computer programs.

The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems, and in UEFI environments. The PE format is a data structure that encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code. This includes dynamic library references for linking, API export and import tables, resource management data and thread-local storage (TLS) data. On NT operating systems, the PE format is used for EXE, DLL, SYS, MUI and other file types. The Unified Extensible Firmware Interface (UEFI) specification states that PE is the standard executable format in EFI environments.

An object file is a file that contains machine code or bytecode, as well as other data and metadata, generated by a compiler or assembler from source code during the compilation or assembly process. The machine code that is generated is known as object code.

In computer programming, a magic number is any of the following:

The archiver, also known simply as ar, is a Unix utility that maintains groups of files as a single archive file. Today, ar is generally used only to create and update static library files that the link editor or linker uses and for generating .deb packages for the Debian family; it can be used to create archives for any purpose, but has been largely replaced by tar for purposes other than static libraries. An implementation of ar is included as one of the GNU Binutils.

In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier, constant, procedure and function in a program's source code is associated with information relating to its declaration or appearance in the source. In other words, the entries of a symbol table store the information related to the entry's corresponding symbol.

In computer systems a loader is the part of an operating system that is responsible for loading programs and libraries. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them for execution. Loading a program involves either memory-mapping or copying the contents of the executable file containing the program instructions into memory, and then carrying out other required preparatory tasks to prepare the executable for running. Once loading is complete, the operating system starts the program by passing control to the loaded program code.

DWARF is a widely used, standardized debugging data format. DWARF was originally designed along with Executable and Linkable Format (ELF), although it is independent of object file formats. The name is a medieval fantasy complement to "ELF" that had no official meaning, although the name "Debugging With Arbitrary Record Formats" has since been proposed as a backronym.

Mach-O, short for Mach object file format, is a file format for executables, object code, shared libraries, dynamically loaded code, and core dumps. It was developed to replace the a.out format.

vmlinux Executable file containing the Linux kernel

vmlinux is a statically linked executable file that contains the Linux kernel in one of the object file formats supported by Linux, which includes Executable and Linkable Format (ELF) and Common Object File Format (COFF). The vmlinux file might be required for kernel debugging, symbol table generation or other operations, but must be made bootable before being used as an operating system kernel by adding a multiboot header, bootsector and setup routines.

Relocation is the process of assigning load addresses for position-dependent code and data of a program and adjusting the code and data to reflect the assigned addresses. Prior to the advent of multiprocess systems, and still in many embedded systems, the addresses for objects were absolute starting at a known location, often zero. Since multiprocessing systems dynamically link and switch between programs it became necessary to be able to relocate objects using position-independent code. A linker usually performs relocation in conjunction with symbol resolution, the process of searching files and libraries to replace symbolic references or names of libraries with actual usable addresses in memory before running a program.

Dynamic-link library (DLL) is Microsoft's implementation of the shared library concept in the Microsoft Windows and OS/2 operating systems. These libraries usually have the file extension DLL, OCX, or DRV . The file formats for DLLs are the same as for Windows EXE files – that is, Portable Executable (PE) for 32-bit and 64-bit Windows, and New Executable (NE) for 16-bit Windows. As with EXEs, DLLs can contain code, data, and resources, in any combination.

a.out is a file format used in older versions of Unix-like computer operating systems for executables, object code, and, in later systems, shared libraries. This is an abbreviated form of "assembler output", the filename of the output of Ken Thompson's PDP-7 assembler. The term was subsequently applied to the format of the resulting file to contrast with other formats for object code.

In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed, by copying the content of libraries from persistent storage to RAM, filling jump tables and relocating pointers. The specific operating system and executable format determine how the dynamic linker functions and how it is implemented.

Program database (PDB) is a file format for storing debugging information about a program. PDB files commonly have a .pdb extension. A PDB file is typically created from source files during compilation. It stores a list of all symbols in a module with their addresses and possibly the name of the file and the line on which the symbol was declared. This symbol information is not stored in the module itself, because it takes up a lot of space.

The Object Module Format (OMF) is an object file format used primarily for software intended to run on Intel 80x86 microprocessors. It was originally developed by Intel around 1975–1977 for ISIS-II, targetting the 8-bit 8080/8085 processors. This variant later became known as OMF-80. As OMF-86 it was adapted to the 16-bit 8086 processor in 1978. Version 4.0 for the 8086 family was released in 1981 under the name Relocatable Object Module Format, and is perhaps best known to DOS users as an .OBJ file. Versions for the 80286 (OMF-286) and the 32-bit 80386 processors (OMF-386) were introduced in 1981 and 1985, respectively. It has since been standardized by the Tool Interface Standards Committee and was also extended by Microsoft and IBM (IBM-OMF). Intel also adapted the format to the 8051 microcontroller (OMF-51 and AOMF).

A debug symbol is a special kind of symbol that attaches additional information to the symbol table of an object file, such as a shared library or an executable. This information allows a symbolic debugger to gain access to information from the source code of the binary, such as the names of identifiers, including variables and routines.

Memory footprint refers to the amount of main memory that a program uses or references while running.

In computer programming, the Arm Image Format (AIF) is an object file format used primarily for software intended to run on ARM microprocessors. It was introduced by Acorn Computers for use with their Archimedes computer. It can optionally facilitate debugging, including under operating systems running on other processor architectures.

References

  1. "LIB Reference (Embedded Visual C++ Programmers Guide)". msdn.microsoft.com. Archived from the original on 2003-08-25. Retrieved 2021-02-04.
  2. "IMAGE_FILE_HEADER structure (winnt.h)". 2018-05-12. Retrieved 2023-12-22.
  3. Microsoft Corporation 2006b
  4. "11 Common Object File Format (COFF)". UNIX System V/386 Release 3.2 Programmer's Guide, Volume II (PDF). Prentice-Hall. 1989. ISBN   0-13-944885-3.

Further reading