Loader (computing)

Last updated

In computer systems a loader is the part of an operating system that is responsible for loading programs and libraries. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them for execution. Loading a program involves either memory-mapping or copying the contents of the executable file containing the program instructions into memory, and then carrying out other required preparatory tasks to prepare the executable for running. Once loading is complete, the operating system starts the program by passing control to the loaded program code.

Contents

All operating systems that support program loading have loaders, apart from highly specialized computer systems that only have a fixed set of specialized programs. Embedded systems typically do not have loaders, and instead, the code executes directly from ROM or similar. In order to load the operating system itself, as part of booting, a specialized boot loader is used. In many operating systems, the loader resides permanently in memory, though some operating systems that support virtual memory may allow the loader to be located in a region of memory that is pageable.

In the case of operating systems that support virtual memory, the loader may not actually copy the contents of executable files into memory, but rather may simply declare to the virtual memory subsystem that there is a mapping between a region of memory allocated to contain the running program's code and the contents of the associated executable file. (See memory-mapped file.) The virtual memory subsystem is then made aware that pages with that region of memory need to be filled on demand if and when program execution actually hits those areas of unfilled memory. This may mean parts of a program's code are not actually copied into memory until they are actually used, and unused code may never be loaded into memory at all.

Responsibilities

In Unix, the loader is the handler for the system call execve(). [1] The Unix loader's tasks include:

  1. validation (permissions, memory requirements etc.);
  2. memory-mapping the executable object from the disk into main memory;
  3. copying the command-line arguments into virtual memory;
  4. initializing registers (e.g., the stack pointer);
  5. jumping to the program entry point (_start).

In Microsoft Windows 7 and above, the loader is the LdrInitializeThunk function contained in ntdll.dll, which does the following:

  1. initialisation of structures in the DLL itself (i.e. critical sections, module lists);
  2. validation of executable to load;
  3. creation of a heap (via the function RtlCreateHeap);
  4. allocation of environment variable block and PATH block;
  5. addition of executable and NTDLL to the module list (a doubly-linked list);
  6. loading of KERNEL32.DLL to obtain several important functions, for instance BaseThreadInitThunk;
  7. loading of executable's imports (i.e. dynamic-link libraries) recursively (check the imports' imports, their imports and so on);
  8. in debug mode, raising of system breakpoint;
  9. initialisation of DLLs;
  10. garbage collection;
  11. calling NtContinue on the context parameter given to the loader function (i.e. jumping to RtlUserThreadStart, that will start the executable)

Relocating loaders

Some operating systems need relocating loaders, which adjust addresses (pointers) in the executable to compensate for variations in the address at which loading starts. The operating systems that need relocating loaders are those in which a program is not always loaded into the same location in the (virtual) address space and in which pointers are absolute addresses rather than offsets from the program's base address. Some well-known examples are IBM's OS/360 for their System/360 mainframes, and its descendants, including z/OS for the z/Architecture mainframes.

OS/360 and derivatives

In OS/360 and descendant systems, the (privileged) operating system facility is called IEWFETCH, [2] and is an internal component of the OS Supervisor, whereas the (non-privileged) LOADER application can perform many of the same functions, plus those of the Linkage Editor, and is entirely external to the OS Supervisor (although it certainly uses many Supervisor services).

IEWFETCH utilizes highly specialized channel programs, and it is theoretically possible to load and to relocate an entire executable within one revolution of the DASD media (about 16.6 ms maximum, 8.3 ms average, on "legacy" 3,600 rpm drives). For load modules which exceed a track in size, it is also possible to load and to relocate the entire module without losing a revolution of the media.

IEWFETCH also incorporates facilities for so-called overlay structures, and which facilitates running potentially very large executables in a minimum memory model (as small as 44 KB on some versions of the OS, but 88 KB and 128 KB are more common).

The OS's nucleus (the always resident portion of the Supervisor) itself is formatted in a way that is compatible with a stripped-down version of IEWFETCH. Unlike normal executables, the OS's nucleus is "scatter loaded": parts of the nucleus are loaded into different portions of memory; in particular, certain system tables are required to reside below the initial 64 KB, while other tables and code may reside elsewhere.

The system's Linkage Editor application is named IEWL. [3] IEWL's main function is to associate load modules (executable programs) and object modules (the output from, say, assemblers and compilers), including "automatic calls" to libraries (high-level language "built-in functions"), into a format which may be most efficiently loaded by IEWFETCH. There are a large number of editing options, but for a conventional application only a few of these are commonly employed.

The load module format includes an initial "text record", followed immediately by the "relocation and/or control record" for that text record, followed by more instances of text record and relocation and/or control record pairs, until the end of the module.

The text records are usually very large; the relocation and/or control records are small as IEWFETCH's three relocation and/or control record buffers are fixed at 260 bytes (smaller relocation and/or control records are certainly possible, but 260 bytes is the maximum possible, and IEWL ensures that this limitation is complied with, by inserting additional relocation records, as required, before the next text record, if necessary; in this special case, the sequence of records may be: ..., text record, relocation record, ..., control record, text record, ...).

A special byte within the relocation and/or control record buffer is used as a "disabled bit spin" communication area, and is initialized to a unique value. The Read CCW for that relocation and/or control record has the Program Controlled Interrupt bit set. The processor is thereby notified when that CCW has been accessed by the channel via a special IOS exit. At this point the processor enters the "disabled bit spin" loop (sometimes called "the shortest loop in the world"). Once that byte changes from its initialized value, the CPU exits the bit spin, and relocation occurs, during the "gap" within the media between the relocation and/or control record and the next text record. If relocation is finished before the next record, the NOP CCW following the Read will be changed to a TIC, and loading and relocating will proceed using the next buffer; if not, then the channel will stop at the NOP CCW, until it is restarted by IEWFETCH via another special IOS exit. The three buffers are in a continuous circular queue, each pointing to its next, and the last pointing to the first, and three buffers are constantly reused as loading and relocating proceeds.

IEWFETCH can, thereby, load and relocate a load module of any practical size, and in the minimum possible time.

Dynamic linkers

Dynamic linking loaders are another type of loader that load and link shared libraries (like .so files, .dll files or .dylib files) to already loaded running programs.

Where such shared libraries can be shared by multiple processes, with only one single copy of the shared code possibly appearing at a different (virtual) address in each process's address space, the code in the shared library is required to be relocatable, ie the library must only use self-relative or code segment base-relative internal addresses throughout. Some processor have instructions that can use self-relative code-references in order to facilitate this.

See also

Related Research Articles

<span class="mw-page-title-main">Linker (computing)</span> Computer program which combines multiple object files into a single file

In computing, a linker or link editor is a computer system program that takes one or more object files and combines them into a single executable file, library file, or another "object" file.

A shared library or shared object is a computer file that contains executable code designed to be used by multiple computer programs or other libraries at runtime.

The Portable Executable (PE) format is a file format for executables, object code, dynamic-link-libraries (DLLs), and binary files used on 32-bit and 64-bit Windows operating systems, as well as in UEFI environments. It is the standard format for executables on Windows NT-based systems, including files such as .exe, .dll, .sys, and .mui. At its core, the PE format is a structured data container that gives the Windows operating system loader everything it needs to properly manage the executable code it contains. This includes references for dynamically linked libraries, tables for importing and exporting APIs, resource management data and thread-local storage (TLS) information.

<span class="mw-page-title-main">Library (computing)</span> Collection of resources used to develop a computer program

In computer science, a library is a collection of resources that is leveraged during software development to implement a computer program.

In computing, rebasing is the process of modifying data based on one reference to another. It can be one of the following:

In computing, position-independent code (PIC) or position-independent executable (PIE) is a body of machine code that executes properly regardless of its memory address. PIC is commonly used for shared libraries, so that the same library code can be loaded at a location in each program's address space where it does not overlap with other memory in use by, for example, other shared libraries. PIC was also used on older computer systems that lacked an MMU, so that the operating system could keep applications away from each other even within the single address space of an MMU-less system.

Mach-O, short for Mach object file format, is a file format for executables, object code, shared libraries, dynamically loaded code, and core dumps. It was developed to replace the a.out format.

The Macintosh Toolbox implements many of the high-level features of the Classic Mac OS, including a set of application programming interfaces for software development on the platform. The Toolbox consists of a number of "managers," software components such as QuickDraw, responsible for drawing onscreen graphics, and the Menu Manager, which maintain data structures describing the menu bar. As the original Macintosh was designed without virtual memory or memory protection, it was important to classify code according to when it should be loaded into memory or kept on disk, and how it should be accessed. The Toolbox consists of subroutines essential enough to be permanently kept in memory and accessible by a two-byte machine instruction; however it excludes core "kernel" functionality such as memory management and the file system. Note that the Toolbox does not draw the menu onscreen: menus were designed to have a customizable appearance, so the drawing code was stored in a resource, which could be on a disk.

Relocation is the process of assigning load addresses for position-dependent code and data of a program and adjusting the code and data to reflect the assigned addresses. Prior to the advent of multiprocess systems, and still in many embedded systems, the addresses for objects are absolute starting at a known location, often zero. Since multiprocessing systems dynamically link and switch between programs it became necessary to be able to relocate objects using position-independent code. A linker usually performs relocation in conjunction with symbol resolution, the process of searching files and libraries to replace symbolic references or names of libraries with actual usable addresses in memory before running a program.

A dynamic-link library (DLL) is a shared library in the Microsoft Windows or OS/2 operating system.

In computer programming, the term hooking covers a range of techniques used to alter or augment the behaviour of an operating system, of applications, or of other software components by intercepting function calls or messages or events passed between software components. Code that handles such intercepted function calls, events or messages is called a hook.

In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed, by copying the content of libraries from persistent storage to RAM, filling jump tables and relocating pointers. The specific operating system and executable format determine how the dynamic linker functions and how it is implemented.

In computer programming, DLL injection is a technique used for running code within the address space of another process by forcing it to load a dynamic-link library. DLL injection is often used by external programs to influence the behavior of another program in a way its authors did not anticipate or intend. For example, the injected code could hook system function calls, or read the contents of password textboxes, which cannot be done the usual way. A program used to inject arbitrary code into arbitrary processes is called a DLL injector.

<span class="mw-page-title-main">Overlay (programming)</span> Programming method

In a general computing sense, overlaying means "the process of transferring a block of program code or other data into main memory, replacing what is already stored". Overlaying is a programming method that allows programs to be larger than the computer's main memory. An embedded system would normally use overlays because of the limitation of physical memory, which is internal memory for a system-on-chip, and the lack of virtual memory facilities.

The Java class loader, part of the Java Runtime Environment, dynamically loads Java classes into the Java Virtual Machine. Usually classes are only loaded on demand. The virtual machine will only load the class files required for executing the program. The Java run time system does not need to know about files and file systems as this is delegated to the class loader.

Dynamic loading is a mechanism by which a computer program can, at run time, load a library into memory, retrieve the addresses of functions and variables contained in the library, execute those functions or access those variables, and unload the library from memory. It is one of the three mechanisms by which a computer program can use some other software within the program; the others are static linking and dynamic linking. Unlike static linking and dynamic linking, dynamic loading allows a computer program to start up in the absence of these libraries, to discover available libraries, and to potentially gain additional functionality.

<span class="mw-page-title-main">OS/VS2 (SVS)</span> Operating system

Single Virtual Storage (SVS) refers to Release 1 of Operating System/Virtual Storage 2 (OS/VS2); it is the successor system to the MVT option of Operating System/360. OS/VS2 (SVS) was a stopgap measure pending the availability of MVS, although IBM provided support and enhancements to SVS long after shipping MVS.

The IBM System/360 architecture is the model independent architecture for the entire S/360 line of mainframe computers, including but not limited to the instruction set architecture. The elements of the architecture are documented in the IBM System/360 Principles of Operation and the IBM System/360 I/O Interface Channel to Control Unit Original Equipment Manufacturers' Information manuals.

The OS/360 Object File Format is the standard object module file format for the IBM DOS/360, OS/360 and VM/370, Univac VS/9, and Fujitsu BS2000 mainframe operating systems. In the 1990s, the format was given an extension with the XSD-type record for the MVS Operating System to support longer module names in the C Programming Language. This format is still in use by the z/VSE operating system. In contrast, it has been superseded by the GOFF file format on the MVS Operating System and on the z/VM Operating System. Since the MVS and z/VM loaders will still handle this older format, some compilers have chosen to continue to produce this format instead of the newer GOFF format.

The GOFF specification was developed for IBM's MVS operating system to supersede the IBM OS/360 Object File Format to compensate for weaknesses in the older format.

References

  1. "exec". The Open Group Base Specifications Issue 6, IEEE Std 1003.1, 2004 Edition. The Open Group. Retrieved 2008-06-23.
  2. IBM Corporation (1972). IBM OS MVT Supervisor (PDF).
  3. IBM Corporation (1972). IBM OS Linkage Editor and Loader (PDF).