Loader (computing)

Last updated

In computer systems a loader is the part of an operating system that is responsible for loading programs and libraries. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them for execution. Loading a program involves either memory-mapping or copying the contents of the executable file containing the program instructions into memory, and then carrying out other required preparatory tasks to prepare the executable for running. Once loading is complete, the operating system starts the program by passing control to the loaded program code.

Contents

All operating systems that support program loading have loaders, apart from highly specialized computer systems that only have a fixed set of specialized programs. Embedded systems typically do not have loaders, and instead, the code executes directly from ROM or similar. In order to load the operating system itself, as part of booting, a specialized boot loader is used. In many operating systems, the loader resides permanently in memory, though some operating systems that support virtual memory may allow the loader to be located in a region of memory that is pageable.

In the case of operating systems that support virtual memory, the loader may not actually copy the contents of executable files into memory, but rather may simply declare to the virtual memory subsystem that there is a mapping between a region of memory allocated to contain the running program's code and the contents of the associated executable file. (See memory-mapped file.) The virtual memory subsystem is then made aware that pages with that region of memory need to be filled on demand if and when program execution actually hits those areas of unfilled memory. This may mean parts of a program's code are not actually copied into memory until they are actually used, and unused code may never be loaded into memory at all.

Responsibilities

In Unix, the loader is the handler for the system call execve(). [1] The Unix loader's tasks include:

  1. validation (permissions, memory requirements etc.);
  2. memory-mapping the executable object from the disk into main memory;
  3. copying the command-line arguments into virtual memory;
  4. initializing registers (e.g., the stack pointer);
  5. jumping to the program entry point (_start).

In Microsoft Windows 7 and above, the loader is the LdrInitializeThunk function contained in ntdll.dll, that does the following:

  1. initialisation of structures in the DLL itself (i.e. critical sections, module lists);
  2. validation of executable to load;
  3. creation of a heap (via the function RtlCreateHeap);
  4. allocation of environment variable block and PATH block;
  5. addition of executable and NTDLL to the module list (a doubly-linked list);
  6. loading of KERNEL32.DLL to obtain several important functions, for instance BaseThreadInitThunk;
  7. loading of executable's imports (i.e. dynamic-link libraries) recursively (check the imports' imports, their imports and so on);
  8. in debug mode, raising of system breakpoint;
  9. initialisation of DLLs;
  10. garbage collection;
  11. calling NtContinue on the context parameter given to the loader function (i.e. jumping to RtlUserThreadStart, that will start the executable)

Relocating loaders

Some operating systems need relocating loaders, which adjust addresses (pointers) in the executable to compensate for variations in the address at which loading starts. The operating systems that need relocating loaders are those in which a program is not always loaded into the same location in the (virtual) address space and in which pointers are absolute addresses rather than offsets from the program's base address. Some well-known examples are IBM's OS/360 for their System/360 mainframes, and its descendants, including z/OS for the z/Architecture mainframes.

OS/360 & Derivatives

In OS/360 and descendant systems, the (privileged) operating system facility is called IEWFETCH, [2] and is an internal component of the OS Supervisor, whereas the (non-privileged) LOADER application can perform many of the same functions, plus those of the Linkage Editor, and is entirely external to the OS Supervisor (although it certainly uses many Supervisor services).

IEWFETCH utilizes highly specialized channel programs, and it is theoretically possible to load and to relocate an entire executable within one revolution of the DASD media (about 16.6 ms maximum, 8.3 ms average, on "legacy" 3,600 rpm drives). For load modules which exceed a track in size, it is also possible to load and to relocate the entire module without losing a revolution of the media.

IEWFETCH also incorporates facilities for so-called overlay structures, and which facilitates running potentially very large executables in a minimum memory model (as small as 44 KB on some versions of the OS, but 88 KB and 128 KB are more common).

The OS's nucleus (the always resident portion of the Supervisor) itself is formatted in a way that is compatible with a stripped-down version of IEWFETCH. Unlike normal executables, the OS's nucleus is "scatter loaded": parts of the nucleus are loaded into different portions of memory; in particular, certain system tables are required to reside below the initial 64 KB, while other tables and code may reside elsewhere.

The system's Linkage Editor application is named IEWL. [3] IEWL's main function is to associate load modules (executable programs) and object modules (the output from, say, assemblers and compilers), including "automatic calls" to libraries (high-level language "built-in functions"), into a format which may be most efficiently loaded by IEWFETCH. There are a large number of editing options, but for a conventional application only a few of these are commonly employed.

The load module format includes an initial "text record", followed immediately by the "relocation and/or control record" for that text record, followed by more instances of text record and relocation and/or control record pairs, until the end of the module.

The text records are usually very large; the relocation and/or control records are small as IEWFETCH's three relocation and/or control record buffers are fixed at 260 bytes (smaller relocation and/or control records are certainly possible, but 260 bytes is the maximum possible, and IEWL ensures that this limitation is complied with, by inserting additional relocation records, as required, before the next text record, if necessary; in this special case, the sequence of records may be: ..., text record, relocation record, ..., control record, text record, ...).

A special byte within the relocation and/or control record buffer is used as a "disabled bit spin" communication area, and is initialized to a unique value. The Read CCW for that relocation and/or control record has the Program Controlled Interrupt bit set. The processor is thereby notified when that CCW has been accessed by the channel via a special IOS exit. At this point the processor enters the "disabled bit spin" loop (sometimes called "the shortest loop in the world"). Once that byte changes from its initialized value, the CPU exits the bit spin, and relocation occurs, during the "gap" within the media between the relocation and/or control record and the next text record. If relocation is finished before the next record, the NOP CCW following the Read will be changed to a TIC, and loading and relocating will proceed using the next buffer; if not, then the channel will stop at the NOP CCW, until it is restarted by IEWFETCH via another special IOS exit. The three buffers are in a continuous circular queue, each pointing to its next, and the last pointing to the first, and three buffers are constantly reused as loading and relocating proceeds.

IEWFETCH can, thereby, load and relocate a load module of any practical size, and in the minimum possible time.

Dynamic linkers

Dynamic linking loaders are another type of loader that load and link shared libraries (like .so files, .dll files or .dylib files) to already loaded running programs.

Where such shared libraries can be shared by multiple processes, with only one single copy of the shared code possibly appearing at a different (virtual) address in each process's address space, the code in the shared library is required to be relocatable, ie the library must only use self-relative or code segment base-relative internal addresses throughout. Some processor have instructions that can use self-relative code-references in order to facilitate this.

See also

Related Research Articles

<span class="mw-page-title-main">Linker (computing)</span> Computer program which combines multiple object files into a single file

In computing, a linker or link editor is a computer system program that takes one or more object files and combines them into a single executable file, library file, or another "object" file.

<span class="mw-page-title-main">Booting</span> Process of starting a computer

In computing, booting is the process of starting a computer as initiated via hardware such as a button or by a software command. After it is switched on, a computer's central processing unit (CPU) has no software in its main memory, so some process must load software into memory before it can be executed. This may be done by hardware or firmware in the CPU, or by a separate processor in the computer system.

A shared library or shared object is a file that is intended to be shared by executable files and further shared object files. Modules used by a program are loaded from individual shared objects into memory at load time or runtime, rather than being copied by a linker when it creates a single monolithic executable file for the program.

The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems. The PE format is a data structure that encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code. This includes dynamic library references for linking, API export and import tables, resource management data and thread-local storage (TLS) data. On NT operating systems, the PE format is used for EXE, DLL, SYS, MUI and other file types. The Unified Extensible Firmware Interface (UEFI) specification states that PE is the standard executable format in EFI environments.

<span class="mw-page-title-main">Library (computing)</span> Collection of non-volatile resources used by computer programs

In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subroutines, classes, values or type specifications. In IBM's OS/360 and its successors they are referred to as partitioned data sets.

An object file is a computer file containing object code, that is, machine code output of an assembler or compiler. The object code is usually relocatable, and not usually directly executable. There are various formats for object files, and the same machine code can be packaged in different object file formats. An object file may also work like a shared library.

In computing, rebasing is the process of modifying data based on one reference to another. It can be one of the following:

In computing, position-independent code (PIC) or position-independent executable (PIE) is a body of machine code that, being placed somewhere in the primary memory, executes properly regardless of its absolute address. PIC is commonly used for shared libraries, so that the same library code can be loaded at a location in each program's address space where it does not overlap with other memory in use by, for example, other shared libraries. PIC was also used on older computer systems that lacked an MMU, so that the operating system could keep applications away from each other even within the single address space of an MMU-less system.

The Macintosh Toolbox implements many of the high-level features of the Classic Mac OS, including a set of application programming interfaces for software development on the platform. The Toolbox consists of a number of "managers," software components such as QuickDraw, responsible for drawing onscreen graphics, and the Menu Manager, which maintain data structures describing the menu bar. As the original Macintosh was designed without virtual memory or memory protection, it was important to classify code according to when it should be loaded into memory or kept on disk, and how it should be accessed. The Toolbox consists of subroutines essential enough to be permanently kept in memory and accessible by a two-byte machine instruction; however it excludes core "kernel" functionality such as memory management and the file system. Note that the Toolbox does not draw the menu onscreen: menus were designed to have a customizable appearance, so the drawing code was stored in a resource, which could be on a disk.

Relocation is the process of assigning load addresses for position-dependent code and data of a program and adjusting the code and data to reflect the assigned addresses. Prior to the advent of multiprocess systems, and still in many embedded systems, the addresses for objects were absolute starting at a known location, often zero. Since multiprocessing systems dynamically link and switch between programs it became necessary to be able to relocate objects using position-independent code. A linker usually performs relocation in conjunction with symbol resolution, the process of searching files and libraries to replace symbolic references or names of libraries with actual usable addresses in memory before running a program.

Dynamic-link library (DLL) is Microsoft's implementation of the shared library concept in the Microsoft Windows and OS/2 operating systems. These libraries usually have the file extension DLL, OCX, or DRV . The file formats for DLLs are the same as for Windows EXE files – that is, Portable Executable (PE) for 32-bit and 64-bit Windows, and New Executable (NE) for 16-bit Windows. As with EXEs, DLLs can contain code, data, and resources, in any combination.

In computer programming, the term hooking covers a range of techniques used to alter or augment the behaviour of an operating system, of applications, or of other software components by intercepting function calls or messages or events passed between software components. Code that handles such intercepted function calls, events or messages is called a hook.

In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed, by copying the content of libraries from persistent storage to RAM, filling jump tables and relocating pointers. The specific operating system and executable format determine how the dynamic linker functions and how it is implemented.

In computer programming, DLL injection is a technique used for running code within the address space of another process by forcing it to load a dynamic-link library. DLL injection is often used by external programs to influence the behavior of another program in a way its authors did not anticipate or intend. For example, the injected code could hook system function calls, or read the contents of password textboxes, which cannot be done the usual way. A program used to inject arbitrary code into arbitrary processes is called a DLL injector.

<span class="mw-page-title-main">Overlay (programming)</span>

In a general computing sense, overlaying means "the process of transferring a block of program code or other data into main memory, replacing what is already stored". Overlaying is a programming method that allows programs to be larger than the computer's main memory. An embedded system would normally use overlays because of the limitation of physical memory, which is internal memory for a system-on-chip, and the lack of virtual memory facilities.

Dynamic loading is a mechanism by which a computer program can, at run time, load a library into memory, retrieve the addresses of functions and variables contained in the library, execute those functions or access those variables, and unload the library from memory. It is one of the 3 mechanisms by which a computer program can use some other software; the other two are static linking and dynamic linking. Unlike static linking and dynamic linking, dynamic loading allows a computer program to start up in the absence of these libraries, to discover available libraries, and to potentially gain additional functionality.

<span class="mw-page-title-main">OS/360 and successors</span> Operating system for IBM S/360 and later mainframes

OS/360, officially known as IBM System/360 Operating System, is a discontinued batch processing operating system developed by IBM for their then-new System/360 mainframe computer, announced in 1964; it was influenced by the earlier IBSYS/IBJOB and Input/Output Control System (IOCS) packages for the IBM 7090/7094 and even more so by the PR155 Operating System for the IBM 1410/7010 processors. It was one of the earliest operating systems to require the computer hardware to include at least one direct access storage device.

<span class="mw-page-title-main">OS/VS2 (SVS)</span>

Single Virtual Storage (SVS) refers to Release 1 of Operating System/Virtual Storage 2 (OS/VS2); it is the successor system to the MVT option of Operating System/360. OS/VS2 (SVS) was a stopgap measure pending the availability of MVS, although IBM provided support and enhancements to SVS long after shipping MVS.

The IBM System/360 architecture is the model independent architecture for the entire S/360 line of mainframe computers, including but not limited to the instruction set architecture. The elements of the architecture are documented in the IBM System/360 Principles of Operation and the IBM System/360 I/O Interface Channel to Control Unit Original Equipment Manufacturers' Information manuals.

The OS/360 Object File Format is the standard object module file format for the IBM DOS/360, OS/360 and VM/370, Univac VS/9, and Fujitsu BS2000 mainframe operating systems. In the 1990s, the format was given an extension with the XSD-type record for the MVS Operating System to support longer module names in the C Programming Language. This format is still in use by the z/VSE operating system. In contrast, it has been superseded by the GOFF file format on the MVS Operating System and on the z/VM Operating System. Since the MVS and z/VM loaders will still handle this older format, some compilers have chosen to continue to produce this format instead of the newer GOFF format.

References

  1. "exec". The Open Group Base Specifications Issue 6, IEEE Std 1003.1, 2004 Edition. The Open Group. Retrieved 2008-06-23.
  2. IBM Corporation (1972). IBM OS MVT Supervisor (PDF).
  3. IBM Corporation (1972). IBM OS Linkage Editor and Loader (PDF).