Relocation (computing)

Last updated

Relocation is the process of assigning load addresses for position-dependent code and data of a program and adjusting the code and data to reflect the assigned addresses. [1] [2] Prior to the advent of multiprocess systems, and still in many embedded systems, the addresses for objects were absolute starting at a known location, often zero. Since multiprocessing systems dynamically link and switch between programs it became necessary to be able to relocate objects using position-independent code. A linker usually performs relocation in conjunction with symbol resolution, the process of searching files and libraries to replace symbolic references or names of libraries with actual usable addresses in memory before running a program.

Contents

Relocation is typically done by the linker at link time, but it can also be done at load time by a relocating loader, or at run time by the running program itself. Some architectures avoid relocation entirely by deferring address assignment to run time; as, for example, in stack machines with zero address arithmetic or in some segmented architectures where every compilation unit is loaded into a separate segment.

Segmentation

Object files are segmented into various memory segment or section types. Example segment types include code segment (.text), initialized data segment (.data), uninitialized data segment (.bss), or others as established by the programmer, such as common segments, or named static segments.

Relocation table

The relocation table is a list of pointers created by the translator (a compiler or assembler) and stored in the object or executable file. Each entry in the table, or "fixup", is a pointer to an absolute address in the object code that must be changed when the loader relocates the program so that it will refer to the correct location. Fixups are designed to support relocation of the program as a complete unit. In some cases, each fixup in the table is itself relative to a base address of zero, so the fixups themselves must be changed as the loader moves through the table. [2]

In some architectures a fixup that crosses certain boundaries (such as a segment boundary) or that is not aligned on a word boundary is illegal and flagged as an error by the linker. [3]

DOS and 16-bit Windows

Far pointers (32-bit pointers with segment:offset, used to address 20-bit 640 KB memory space available to DOS programs), which point to code or data within a DOS executable (EXE), do not have absolute segments, because the actual address of code/data depends on where the program is loaded in memory and this is not known until the program is loaded.

Instead, segments are relative values in the DOS EXE file. These segments need to be corrected, when the executable has been loaded into memory. The EXE loader uses a relocation table to find the segments which need to be adjusted.

32-bit Windows

With 32-bit Windows operating systems, it is not mandatory to provide relocation tables for EXE files, since they are the first image loaded into the virtual address space and thus will be loaded at their preferred base address.

For both DLLs and for EXEs which opt into address space layout randomization (ASLR), an exploit mitigation technique introduced with Windows Vista, relocation tables once again become mandatory because of the possibility that the binary may be dynamically moved before being executed, even though they are still the first thing loaded in the virtual address space.

64-bit Windows

When running native 64-bit binaries on Windows Vista and above, ASLR is mandatory[ citation needed ], and thus relocation sections cannot be omitted by the compiler.

Unix-like systems

The Executable and Linkable Format (ELF) executable format and shared library format used by most Unix-like systems allows several types of relocation to be defined. [4]

Relocation procedure

The linker reads segment information and relocation tables in the object files and performs relocation by:

Example

The following example uses Donald Knuth's MIX architecture and MIXAL assembly language. The principles are the same for any architecture, though the details will change.

Relocation example.tif

See also

Related Research Articles

<span class="mw-page-title-main">Linker (computing)</span> Computer program which combines multiple object files into a single file

In computing, a linker or link editor is a computer system program that takes one or more object files and combines them into a single executable file, library file, or another "object" file.

<span class="mw-page-title-main">CP/M</span> Discontinued family of computer operating systems

CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/85-based microcomputers by Gary Kildall of Digital Research, Inc. CP/M is a disk operating system and its purpose is to organize files on a magnetic storage medium, and to load and run programs stored on a disk. Initially confined to single-tasking on 8-bit processors and no more than 64 kilobytes of memory, later versions of CP/M added multi-user variations and were migrated to 16-bit processors.

<span class="mw-page-title-main">Library (computing)</span> Collection of non-volatile resources used by computer programs

In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subroutines, classes, values or type specifications. In IBM's OS/360 and its successors they are referred to as partitioned data sets.

In computer science, self-modifying code is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance. The term is usually only applied to code where the self-modification is intentional, not in situations where code accidentally modifies itself due to an error such as a buffer overflow.

<span class="mw-page-title-main">A20 line</span> Signal in the system bus of an x86-based computer system

The A20, or address line 20, is one of the electrical lines that make up the system bus of an x86-based computer system. The A20 line in particular is used to transmit the 21st bit on the address bus.

An object file is a file that contains machine code or bytecode, as well as other data and metadata, generated by a compiler or assembler from source code during the compilation or assembly process. The machine code that is generated is known as object code.

In computer systems a loader is the part of an operating system that is responsible for loading programs and libraries. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them for execution. Loading a program involves either memory-mapping or copying the contents of the executable file containing the program instructions into memory, and then carrying out other required preparatory tasks to prepare the executable for running. Once loading is complete, the operating system starts the program by passing control to the loaded program code.

In computing, position-independent code (PIC) or position-independent executable (PIE) is a body of machine code that, being placed somewhere in the primary memory, executes properly regardless of its absolute address. PIC is commonly used for shared libraries, so that the same library code can be loaded at a location in each program's address space where it does not overlap with other memory in use by, for example, other shared libraries. PIC was also used on older computer systems that lacked an MMU, so that the operating system could keep applications away from each other even within the single address space of an MMU-less system.

A fat binary is a computer executable program or library which has been expanded with code native to multiple instruction sets which can consequently be run on multiple processor types. This results in a file larger than a normal one-architecture binary file, thus the name.

<span class="mw-page-title-main">COM file</span> Type of simple executable file

A COM file is a type of simple executable file. On the Digital Equipment Corporation (DEC) VAX operating systems of the 1970s, .COM was used as a filename extension for text files containing commands to be issued to the operating system. With the introduction of Digital Research's CP/M, the type of files commonly associated with COM extension changed to that of executable files. This convention was later carried over to DOS. Even when complemented by the more general EXE file format for executables, the compact COM files remained viable and frequently used under DOS.

Mach-O, short for Mach object file format, is a file format for executables, object code, shared libraries, dynamically loaded code, and core dumps. It was developed to replace the a.out format.

Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions in that architecture identify the operand(s) of each instruction. An addressing mode specifies how to calculate the effective memory address of an operand by using information held in registers and/or constants contained within a machine instruction or elsewhere.

The Program Segment Prefix (PSP) is a data structure used in DOS systems to store the state of a program. It resembles the Zero Page in the CP/M operating system. The PSP has the following structure:

The zero page or base page is the block of memory at the very beginning of a computer's address space; that is, the page whose starting address is zero. The size of a page depends on the context, and the significance of zero page memory versus higher addressed memory is highly dependent on machine architecture. For example, the Motorola 6800 and MOS Technology 6502 processor families treat the first 256 bytes of memory specially, whereas many other processors do not.

Intel hexadecimal object file format, Intel hex format or Intellec Hex is a file format that conveys binary information in ASCII text form, making it possible to store on non-binary media such as paper tape, punch cards, etc., to display on text terminals or be printed on line-oriented printers. The format is commonly used for programming microcontrollers, EPROMs, and other types of programmable logic devices and hardware emulators. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a object or executable file in hexadecimal format. In some applications, the Intel hex format is also used as a container format holding packets of stream data. Common file extensions used for the resulting files are .HEX or .H86. The HEX file is then read by a programmer to write the machine code into a PROM or is transferred to the target system for loading and execution. There are various tools to convert files between hexadecimal and binary format, and vice versa.

<span class="mw-page-title-main">Overlay (programming)</span>

In a general computing sense, overlaying means "the process of transferring a block of program code or other data into main memory, replacing what is already stored". Overlaying is a programming method that allows programs to be larger than the computer's main memory. An embedded system would normally use overlays because of the limitation of physical memory, which is internal memory for a system-on-chip, and the lack of virtual memory facilities.

The Object Module Format (OMF) is an object file format used primarily for software intended to run on Intel 80x86 microprocessors. It was originally developed by Intel around 1975–1977 for ISIS-II, targetting the 8-bit 8080/8085 processors. This variant later became known as OMF-80. As OMF-86 it was adapted to the 16-bit 8086 processor in 1978. Version 4.0 for the 8086 family was released in 1981 under the name Relocatable Object Module Format, and is perhaps best known to DOS users as an .OBJ file. Versions for the 80286 (OMF-286) and the 32-bit 80386 processors (OMF-386) were introduced in 1981 and 1985, respectively. It has since been standardized by the Tool Interface Standards Committee and was also extended by Microsoft and IBM (IBM-OMF). Intel also adapted the format to the 8051 microcontroller (OMF-51 and AOMF).

In computer programming, a self-relocating program is a program that relocates its own address-dependent instructions and data when run, and is therefore capable of being loaded into memory at any address. In many cases, self-relocating code is also a form of self-modifying code.

A master boot record (MBR) is a special type of boot sector at the very beginning of partitioned computer mass storage devices like fixed disks or removable drives intended for use with IBM PC-compatible systems and beyond. The concept of MBRs was publicly introduced in 1983 with PC DOS 2.0.

The GOFF specification was developed for IBM's MVS operating system to supersede the IBM OS/360 Object File Format to compensate for weaknesses in the older format.

References

  1. "Types of Object Code". iRMX 86 Application Loader Reference Manual (PDF). Intel. pp. 1-2–1-3. Archived (PDF) from the original on 2020-01-11. Retrieved 2020-01-11. […] Absolute code, and an absolute object module, is code that has been processed by LOC86 to run only at a specific location in memory. The Loader loads an absolute object module only into the specific location the module must occupy. Position-independent code (commonly referred to as PIC) differs from absolute code in that PIC can be loaded into any memory location. The advantage of PIC over absolute code is that PIC does not require you to reserve a specific block of memory. When the Loader loads PIC, it obtains iRMX 86 memory segments from the pool of the calling task's job and loads the PIC into the segments. A restriction concerning PIC is that, as in the PL/M-86 COMPACT model of segmentation […], it can have only one code segment and one data segment, rather than letting the base addresses of these segments, and therefore the segments themselves, vary dynamically. This means that PIC programs are necessarily less than 64K bytes in length. PIC code can be produced by means of the BIND control of LINK86. Load-time locatable code (commonly referred to as LTL code) is the third form of object code. LTL code is similar to PIC in that LTL code can be loaded anywhere in memory. However, when loading LTL code, the Loader changes the base portion of pointers so that the pointers are independent of the initial contents of the registers in the microprocessor. Because of this fixup (adjustment of base addresses), LTL code can be used by tasks having more than one code segment or more than one data segment. This means that LTL programs may be more than 64K bytes in length. FORTRAN 86 and Pascal 86 automatically produce LTL code, even for short programs. LTL code can be produced by means of the BIND control of LINK86. […]
  2. 1 2 Levine, John R. (2000) [October 1999]. "Chapter 1: Linking and Loading & Chapter 3: Object Files". Linkers and Loaders. The Morgan Kaufmann Series in Software Engineering and Programming (1 ed.). San Francisco, California, USA: Morgan Kaufmann. p. 5. ISBN   1-55860-496-0. OCLC   42413382. Archived from the original on 2012-12-05. Retrieved 2020-01-12. Code: Errata:
  3. Borland (1999-09-01) [1998-07-02]. "Borland article #15961: Coping with 'Fixup Overflow' messages". community.borland.com. Technical Information Database - Product: Borland C++ 3.1. TI961C.txt #15961. Archived from the original on 2008-07-07. Retrieved 2007-01-15.
  4. "Executable and Linkable Format (ELF)" (PDF). skyfree.org. Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1. Archived (PDF) from the original on 2019-12-24. Retrieved 2018-10-01.

Further reading