Prelink

Last updated

In computing, prebinding, also called prelinking, is a method for optimizing application load times by resolving library symbols prior to launch.

Contents

Background

Most computer programs consist of code that requires external shared libraries to execute. These libraries are normally integrated with the program at run time by a loader, in a process called dynamic linking.

While dynamic linking has advantages in code size and management, there are drawbacks as well. Every time a program is run, the loader needs to resolve (find) the relevant libraries. Since libraries move around in memory, there is a performance penalty for resolution. This penalty increases for each additional library needing resolution.

Prelinking reduces this penalty by resolving libraries in advance. Afterward, resolution only occurs if the libraries have changed since being prelinked, such as following perhaps an upgrade.

Mac OS

Mac OS stores executables in the Mach-O file format.

Mac OS X

Mac OS X performs prebinding in the "Optimizing" stage of installing system software or certain applications.

Prebinding has changed a few times within the Mac OS X series. Before 10.2, prebinding only happened during the installation procedure (the aforementioned "Optimizing" stage). From 10.2 through 10.3 the OS checked for prebinding at launch time for applications, and the first time an application ran it would be prebound, making subsequent launches faster. This could also be manually run, which some OS-level installs did. In 10.4, only OS libraries were prebound. In 10.5 and later, Apple replaced prebinding with a dyld shared cache mechanism, [1] which provided better OS performance.

Linux

On Linux, prelinking is accomplished via the prelink program, a free program written by Jakub Jelínek of Red Hat for ELF binaries.

Performance results have been mixed[ clarification needed ], but it seems to aid systems with a large number of libraries, such as KDE. [2]

When run with the "-R" option, prelink will randomly select the address base where libraries are loaded. This selection makes a return-to-libc attack harder to perform because the addresses are unique to that system. The reason prelink does this is because kernel facilities supplying address space layout randomization (ASLR) for libraries cannot be used in conjunction with prelink without defeating the purpose of prelink and forcing the dynamic linker to perform relocations at program load time.

As stated, prelink and per-process library address randomization cannot be used in conjunction. In order to avoid completely removing this security enhancement, prelink supplies its own randomization; however, this does not help a general information leak caused by prelink. Attackers with the ability to read certain arbitrary files on the target system can discover where libraries are loaded in privileged daemons; often libc is enough as it is the most common library used in return-to-libc attacks.

By reading a shared library file such as libc, an attacker with local access can discover the load address of libc in every other application on the system. Since most programs link to libc, the libc library file always has to be readable; any attacker with local access may gather information about the address space of higher privileged processes. Local access may commonly be gained by shell accounts or Web server accounts that allow the use of CGI scripts, which may read and output any file on the system.[ citation needed ] Directory traversal vulnerabilities can be used by attackers without accounts if CGI script vulnerabilities are available.

Because prelink is often run periodically, typically every two weeks, the address of any given library has a chance of changing over time. prelink is often used in an incremental mode in which already prelinked libraries are not altered unless absolutely necessary, so a library may not change its base address when prelink is re-run. This gives any address derived a half-life of the period in which prelink is run. Also note that if a new version of the library is installed, the addresses changes.

Jakub Jelínek points out that position independent executables (PIE) ignore prelinking on Red Hat Enterprise Linux and Fedora, and recommends that network and SUID programs be built PIE to facilitate a more secure environment.

Issues

Occasionally, prelinking can cause issues with application checkpoint and restart libraries like blcr , [3] as well as other libraries (like OpenMPI) that use blcr internally. Specifically when checkpointing a program on one host, and trying to restart on a different host, the restarted program may fail with a segfault due to differences in host-specific library memory address randomization. [4] [5] [ unreliable source? ]

See also

Related Research Articles

Executable and Linkable Format Standard file format for executables, object code, shared libraries, and core dumps

In computing, the Executable and Linkable Format, is a common standard file format for executable files, object code, shared libraries, and core dumps. First published in the specification for the application binary interface (ABI) of the Unix operating system version named System V Release 4 (SVR4), and later in the Tool Interface Standard, it was quickly accepted among different vendors of Unix systems. In 1999, it was chosen as the standard binary file format for Unix and Unix-like systems on x86 processors by the 86open project.

Linker (computing) Computer program which combines multiple object files into a single file

In computing, a linker or link editor is a computer system program that takes one or more object files and combines them into a single executable file, library file, or another "object" file.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems. The PE format is a data structure that encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code. This includes dynamic library references for linking, API export and import tables, resource management data and thread-local storage (TLS) data. On NT operating systems, the PE format is used for EXE, DLL, SYS, MUI and other file types. The Unified Extensible Firmware Interface (UEFI) specification states that PE is the standard executable format in EFI environments.

System call Mechanism used by an application program to request service from the kernel of the operating system

In computing, a system call is the programmatic way in which a computer program requests a service from the kernel of the operating system on which it is executed. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

Library (computing) Collection of non-volatile resources used by computer programs, often for software development.

In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subroutines, classes, values or type specifications. In IBM's OS/360 and its successors they are referred to as partitioned data sets.

Executable A file that causes a computer to follow indicated instructions

In computing, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a data file that must be interpreted (parsed) by a program to be meaningful.

glibc Standard C Library of the GNU Project

The GNU C Library, commonly known as glibc, is the GNU Project's implementation of the C standard library. Despite its name, it now also directly supports C++. It was started in the 1980s by the Free Software Foundation (FSF) for the GNU operating system.

The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard. Starting from the original ANSI C standard, it was developed at the same time as the C library POSIX specification, which is a superset of it. Since ANSI C was adopted by the International Organization for Standardization, the C standard library is also called the ISO C library.

In computer systems a loader is the part of an operating system that is responsible for loading programs and libraries. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them for execution. Loading a program involves reading the contents of the executable file containing the program instructions into memory, and then carrying out other required preparatory tasks to prepare the executable for running. Once loading is complete, the operating system starts the program by passing control to the loaded program code.

In computing, position-independent code (PIC) or position-independent executable (PIE) is a body of machine code that, being placed somewhere in the primary memory, executes properly regardless of its absolute address. PIC is commonly used for shared libraries, so that the same library code can be loaded in a location in each program address space where it does not overlap with other memory in use. PIC was also used on older computer systems that lacked an MMU, so that the operating system could keep applications away from each other even within the single address space of an MMU-less system.

A fat binary is a computer executable program or library which has been expanded with code native to multiple instruction sets which can consequently be run on multiple processor types. This results in a file larger than a normal one-architecture binary file, thus the name.

Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from reliably jumping to, for example, a particular exploited function in memory, ASLR randomly arranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap and libraries.

In computer science, a static library or statically-linked library is a set of routines, external functions and variables which are resolved in a caller at compile-time and copied into a target application by a compiler, linker, or binder, producing an object file and a stand-alone executable. This executable and the process of compiling it are both known as a static build of the program. Historically, libraries could only be static. Static libraries are either merged with other static libraries and object files during building/linking to form a single executable or loaded at run-time into the address space of their corresponding executable at a static memory offset determined at compile-time/link-time.

Dynamic-link library (DLL) is Microsoft's implementation of the shared library concept in the Microsoft Windows and OS/2 operating systems. These libraries usually have the file extension DLL, OCX, or DRV . The file formats for DLLs are the same as for Windows EXE files – that is, Portable Executable (PE) for 32-bit and 64-bit Windows, and New Executable (NE) for 16-bit Windows. As with EXEs, DLLs can contain code, data, and resources, in any combination.

BMDFM

Binary Modular Dataflow Machine (BMDFM) is software that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single applications. BMDFM automatically identifies and exploits parallelism due to the static and mainly dynamic scheduling of the dataflow instruction sequences derived from the formerly sequential program.

In computing, rpath designates the run-time search path hard-coded in an executable file or library. Dynamic linking loaders use the rpath to find required libraries.

In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed, by copying the content of libraries from persistent storage to RAM, filling jump tables and relocating pointers. The specific operating system and executable format determine how the dynamic linker functions and how it is implemented.

Dynamic loading is a mechanism by which a computer program can, at run time, load a library into memory, retrieve the addresses of functions and variables contained in the library, execute those functions or access those variables, and unload the library from memory. It is one of the 3 mechanisms by which a computer program can use some other software; the other two are static linking and dynamic linking. Unlike static linking and dynamic linking, dynamic loading allows a computer program to start up in the absence of these libraries, to discover available libraries, and to potentially gain additional functionality.

Bionic is an implementation of the standard C library, developed by Google for its Android operating system. It differs from the GNU C Library (glibc) in being designed for devices with less memory and processor power than a typical Linux system. It is based on code from OpenBSD released under a BSD license, rather than glibc, which uses the GNU Lesser General Public License. This difference was important in the early days of Android, when static linking was common, and is still helpful in introducing Android to software companies used to proprietary operating systems, who can be wary of the LGPL, and unclear about the differences between it and the full GNU General Public License (GPL).

References

  1. "Manual Page for update_prebinding". Apple Developer Connection. Apple Computer Inc.
  2. Crasta, James (2004-05-17). "ELF Prelinking and what it can do for you" . Retrieved 2006-05-10.
  3. blcr
  4. "BLCR FAQ" . Retrieved 2012-01-05.
  5. Hursey, Josh (2011-12-29). "segfault when resuming on different host". OpenMPI Users (Mailing list). Retrieved 2012-01-05.

Further reading