Object file

Last updated

An object file is a file that contains machine code or bytecode, as well as other data and metadata, generated by a compiler or assembler from source code during the compilation or assembly process. The machine code that is generated is known as object code.

Contents

The object code is usually relocatable, and not usually directly executable. There are various formats for object files, and the same machine code can be packaged in different object file formats. An object file may also work like a shared library.

The metadata that object files may include can be used for linking or debugging; it includes information to resolve symbolic cross-references between different modules, relocation information, stack unwinding information, comments, program symbols, and debugging or profiling information. Other metadata may include the date and time of compilation, the compiler name and version, and other identifying information.

The term "object program" dates from at least the 1950s:

A term in automatic programming for the machine language program produced by the machine by translating a source program written by the programmer in a language similar to algebraic notation. [1]

A linker is used to combine the object code into one executable program or library pulling in precompiled system libraries as needed.

Object file formats

There are many different object file formats; originally each type of computer had its own unique format, but with the advent of Unix and other portable operating systems, some formats, such as ELF and COFF, have been defined and used on different kinds of systems.

Some systems make a distinction between formats which are directly executable and formats which require processing by the linker. For example, OS/360 and successors call the first format a load module and the second an object module. In this case the files have entirely different formats. [2] DOS and Windows also have different file formats for executable files and object files, such as Portable Executable for executables and COFF for object files in 32-bit and 64-bit Windows.

Unix and Unix-like systems have used the same format for executable and object files, starting with the original a.out format. Some formats can contain machine code for different processors, with the correct one chosen by the operating system when the program is loaded. [3] [4]

The design and/or choice of an object file format is a key part of overall system design. It affects the performance of the linker and thus programmer turnaround while a program is being developed. If the format is used for executables, the design also affects the time programs take to begin running, and thus the responsiveness for users.

The GNU Project's Binary File Descriptor library (BFD library) provides a common API for the manipulation of object files in a variety of formats.

Absolute files

Many early computers, or small microcomputers, support only an absolute object format. Programs are not relocatable; they need to be assembled or compiled to execute at specific, predefined addresses. The file contains no relocation or linkage information. These files can be loaded into read/write memory, or stored in read-only memory. For example, the Motorola 6800 MIKBUG monitor contains a routine to read an absolute object file (SREC Format) from paper tape. [5] DOS COM files are a more recent example of absolute object files. [6]

Segmentation

Most object file formats are structured as separate sections of data, each section containing a certain type of data. These sections are known as "segments" due to the term "memory segment", which was previously a common form of memory management. When a program is loaded into memory by a loader, the loader allocates various regions of memory to the program. Some of these regions correspond to sections of the object file, and thus are usually known by the same names. Others, such as the stack, only exist at run time. In some cases, relocation is done by the loader (or linker) to specify the actual memory addresses. However, for many programs or architectures, relocation is not necessary, due to being handled by the memory management unit or by position-independent code. On some systems the segments of the object file can then be copied (paged) into memory and executed, without needing further processing. On these systems, this may be done lazily, that is, only when the segments are referenced during execution, for example via a memory-mapped file backed by the object file.

Types of data supported by typical object file formats: [7]

Segments in different object files may be combined by the linker according to rules specified when the segments are defined. Conventions exist for segments shared between object files; for instance, in DOS there are different memory models that specify the names of special segments and whether or not they may be combined. [8]

The debugging data format of debugging information may either be an integral part of the object file format, as in COFF, or a semi-independent format which may be used with several object formats, such as stabs or DWARF.

See also

Related Research Articles

<span class="mw-page-title-main">Executable and Linkable Format</span> Standard file format for executables, object code, shared libraries, and core dumps.

In computing, the Executable and Linkable Format, is a common standard file format for executable files, object code, shared libraries, and core dumps. First published in the specification for the application binary interface (ABI) of the Unix operating system version named System V Release 4 (SVR4), and later in the Tool Interface Standard, it was quickly accepted among different vendors of Unix systems. In 1999, it was chosen as the standard binary file format for Unix and Unix-like systems on x86 processors by the 86open project.

<span class="mw-page-title-main">Linker (computing)</span> Computer program which combines multiple object files into a single file

In computing, a linker or link editor is a computer system program that takes one or more object files and combines them into a single executable file, library file, or another "object" file.

<span class="mw-page-title-main">Netwide Assembler</span> Assembler for the Intel x86 architecture

The Netwide Assembler (NASM) is an assembler and disassembler for the Intel x86 architecture. It can be used to write 16-bit, 32-bit (IA-32) and 64-bit (x86-64) programs. It is considered one of the most popular assemblers for Linux and x86 chips.

The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems, and in UEFI environments. The PE format is a data structure that encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code. This includes dynamic library references for linking, API export and import tables, resource management data and thread-local storage (TLS) data. On NT operating systems, the PE format is used for EXE, DLL, SYS, MUI and other file types. The Unified Extensible Firmware Interface (UEFI) specification states that PE is the standard executable format in EFI environments.

<span class="mw-page-title-main">Library (computing)</span> Collection of resources used to develop a computer program

In computer science, a library is a collection of resources that is leveraged during software development to implement a computer program.

The Common Object File Format (COFF) is a format for executable, object code, and shared library computer files used on Unix systems. It was introduced in Unix System V, replaced the previously used a.out format, and formed the basis for extended specifications such as XCOFF and ECOFF, before being largely replaced by ELF, introduced with SVR4. COFF and its variants continue to be used on some Unix-like systems, on Microsoft Windows, in UEFI environments and in some embedded development systems.

In computer systems a loader is the part of an operating system that is responsible for loading programs and libraries. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them for execution. Loading a program involves either memory-mapping or copying the contents of the executable file containing the program instructions into memory, and then carrying out other required preparatory tasks to prepare the executable for running. Once loading is complete, the operating system starts the program by passing control to the loaded program code.

In computing, position-independent code (PIC) or position-independent executable (PIE) is a body of machine code that, being placed somewhere in the primary memory, executes properly regardless of its absolute address. PIC is commonly used for shared libraries, so that the same library code can be loaded at a location in each program's address space where it does not overlap with other memory in use by, for example, other shared libraries. PIC was also used on older computer systems that lacked an MMU, so that the operating system could keep applications away from each other even within the single address space of an MMU-less system.

<span class="mw-page-title-main">COM file</span> Type of simple executable file

A COM file is a type of simple executable file. On the Digital Equipment Corporation (DEC) VAX operating systems of the 1970s, .COM was used as a filename extension for text files containing commands to be issued to the operating system. With the introduction of Digital Research's CP/M, the type of files commonly associated with COM extension changed to that of executable files. This convention was later carried over to DOS. Even when complemented by the more general EXE file format for executables, the compact COM files remained viable and frequently used under DOS.

Mach-O, short for Mach object file format, is a file format for executables, object code, shared libraries, dynamically loaded code, and core dumps. It was developed to replace the a.out format.

vmlinux Executable file containing the Linux kernel

vmlinux is a statically linked executable file that contains the Linux kernel in one of the object file formats supported by Linux, which includes Executable and Linkable Format (ELF) and Common Object File Format (COFF). The vmlinux file might be required for kernel debugging, symbol table generation or other operations, but must be made bootable before being used as an operating system kernel by adding a multiboot header, bootsector and setup routines.

The Microsoft Macro Assembler (MASM) is an x86 assembler that uses the Intel syntax for MS-DOS and Microsoft Windows. Beginning with MASM 8.0, there are two versions of the assembler: One for 16-bit & 32-bit assembly sources, and another (ML64) for 64-bit sources only.

Relocation is the process of assigning load addresses for position-dependent code and data of a program and adjusting the code and data to reflect the assigned addresses. Prior to the advent of multiprocess systems, and still in many embedded systems, the addresses for objects were absolute starting at a known location, often zero. Since multiprocessing systems dynamically link and switch between programs it became necessary to be able to relocate objects using position-independent code. A linker usually performs relocation in conjunction with symbol resolution, the process of searching files and libraries to replace symbolic references or names of libraries with actual usable addresses in memory before running a program.

Intel hexadecimal object file format, Intel hex format or Intellec Hex is a file format that conveys binary information in ASCII text form, making it possible to store on non-binary media such as paper tape, punch cards, etc., to display on text terminals or be printed on line-oriented printers. The format is commonly used for programming microcontrollers, EPROMs, and other types of programmable logic devices and hardware emulators. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a object or executable file in hexadecimal format. In some applications, the Intel hex format is also used as a container format holding packets of stream data. Common file extensions used for the resulting files are .HEX or .H86. The HEX file is then read by a programmer to write the machine code into a PROM or is transferred to the target system for loading and execution. There are various tools to convert files between hexadecimal and binary format, and vice versa.

In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed, by copying the content of libraries from persistent storage to RAM, filling jump tables and relocating pointers. The specific operating system and executable format determine how the dynamic linker functions and how it is implemented.

The Object Module Format (OMF) is an object file format used primarily for software intended to run on Intel 80x86 microprocessors. It was originally developed by Intel around 1975–1977 for ISIS-II, targeting the 8-bit 8080/8085 processors. This variant later became known as OMF-80. As OMF-86 it was adapted to the 16-bit 8086 processor in 1978. Version 4.0 for the 8086 family was released in 1981 under the name Relocatable Object Module Format, and is perhaps best known to DOS users as an .OBJ file. Versions for the 80286 (OMF-286) and the 32-bit 80386 processors (OMF-386) were introduced in 1981 and 1985, respectively. It has since been standardized by the Tool Interface Standards Committee and was also extended by Microsoft and IBM (IBM-OMF). Intel also adapted the format to the 8051 microcontroller (OMF-51 and AOMF).

A debugging data format is a means of storing information about a compiled computer program for use by high-level debuggers. Modern debugging data formats store enough information to allow source-level debugging.

Obj or OBJ may refer to:

In computer programming, the Arm Image Format (AIF) is an object file format used primarily for software intended to run on ARM microprocessors. It was introduced by Acorn Computers for use with their Archimedes computer. It can optionally facilitate debugging, including under operating systems running on other processor architectures.

The OS/360 Object File Format is the standard object module file format for the IBM DOS/360, OS/360 and VM/370, Univac VS/9, and Fujitsu BS2000 mainframe operating systems. In the 1990s, the format was given an extension with the XSD-type record for the MVS Operating System to support longer module names in the C Programming Language. This format is still in use by the z/VSE operating system. In contrast, it has been superseded by the GOFF file format on the MVS Operating System and on the z/VM Operating System. Since the MVS and z/VM loaders will still handle this older format, some compilers have chosen to continue to produce this format instead of the newer GOFF format.

References

  1. Wrubel, Marshal H. (1959). A primer of programming for digital computers. New York, USA: McGraw-Hill. p. 222. Retrieved 2020-07-31.
  2. IBM OS Linkage Editor and Loader (PDF). IBM Corporation. 1973. p. 16. Retrieved 2012-08-06.
  3. "Universal Binaries and 32-bit/64-bit PowerPC Binaries". OS X ABI Mach-O File Format Reference. Apple Inc. 2009-02-04 [2003]. Archived from the original on 2014-09-04.
  4. "FatELF: Universal Binaries for Linux" . Retrieved 2020-08-02.
  5. Wiles, Mike; Felix, Andre. MCM6830L7 MIKBUG/MINIBUG ROM (PDF). Motorola Semiconductor Products, Inc. Retrieved 2020-07-31.
  6. Godse, Deepali A.; Godse, Atul P. (2008). Microprocessor (1 ed.). Pune, India: Technical Publications. pp. 3–15. ISBN   978-81-8431-355-0.
  7. Mauerer, Wolfgang (2010). "Appendix E. The ELF Binary Format". Professional Linux Kernel Architecture. John Wiley & Sons. p. Appendix E. ISBN   978-0-470-34343-2 . Retrieved 2020-08-01.
  8. Irvine, Kip R. (1993). Assembly language for the IBM-PC (2 ed.). New York, USA: Macmillan. ISBN   0-02-359651-1.

Further reading