Portable Executable

Last updated
Portable Executable
Filename extension
.acm, .ax, .cpl, .dll, .drv, .efi, .exe , .mui, .ocx, .scr, .sys, .tsp, .mun
Internet media type
application/vnd.microsoft.portable-executable [1]
Developed byCurrently: Microsoft
Type of format Binary, executable, object, shared libraries
Extended from DOS MZ executable
COFF

The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems, and in UEFI environments. [2] The PE format is a data structure that encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code. This includes dynamic library references for linking, API export and import tables, resource management data and thread-local storage (TLS) data. On NT operating systems, the PE format is used for EXE, DLL, SYS (device driver), MUI and other file types. The Unified Extensible Firmware Interface (UEFI) specification states that PE is the standard executable format in EFI environments. [3]

Contents

On Windows NT operating systems, PE currently supports the x86-32, x86-64 (AMD64/Intel 64), IA-64, ARM and ARM64 instruction set architectures (ISAs). Prior to Windows 2000, Windows NT (and thus PE) supported the MIPS, Alpha, and PowerPC ISAs. Because PE is used on Windows CE, it continues to support several variants of the MIPS, ARM (including Thumb), and SuperH ISAs. [4]

Analogous formats to PE are ELF (used in Linux and most other versions of Unix) and Mach-O (used in macOS and iOS).

History

Microsoft migrated to the PE format from the 16-bit NE formats with the introduction of the Windows NT 3.1 operating system. All later versions of Windows, including Windows 95/98/ME and the Win32s addition to Windows 3.1x, support the file structure. The format has retained limited legacy support to bridge the gap between DOS-based and NT systems. For example, PE/COFF headers still include a DOS executable program, which is by default a DOS stub that displays a message like "This program cannot be run in DOS mode" (or similar), though it can be a full-fledged DOS version of the program (a later notable case being the Windows 98 SE installer). [5] This constitutes a form of fat binary. PE also continues to serve the changing Windows platform. Some extensions include the .NET PE format (see below), a version with 64-bit address space support called PE32+, [6] and a specification for Windows CE.

Technical details

Layout

Structure of a Portable Executable 32 bit Portable Executable 32 bit Structure in SVG fixed.svg
Structure of a Portable Executable 32 bit

A PE file consists of a number of headers and sections that tell the dynamic linker how to map the file into memory. An executable image consists of several different regions, each of which requires different memory protection; so the start of each section must be aligned to a page boundary. [7] For instance, typically the .text section (which holds program code) is mapped as execute/read-only, and the .data section (holding global variables) is mapped as no-execute/read write. However, to avoid wasting space, the different sections are not page aligned on disk. Part of the job of the dynamic linker is to map each section to memory individually and assign the correct permissions to the resulting regions, according to the instructions found in the headers. [8]

Import table

One section of note is the import address table (IAT), which is used as a lookup table when the application is calling a function in a different module. It can be in the form of both import by ordinal and import by name. Because a compiled program cannot know the memory location of the libraries it depends upon, an indirect jump is required whenever an API call is made. As the dynamic linker loads modules and joins them together, it writes actual addresses into the IAT slots, so that they point to the memory locations of the corresponding library functions. Though this adds an extra jump over the cost of an intra-module call resulting in a performance penalty, it provides a key benefit: The number of memory pages that need to be copy-on-write changed by the loader is minimized, saving memory and disk I/O time. If the compiler knows ahead of time that a call will be inter-module (via a dllimport attribute) it can produce more optimized code that simply results in an indirect call opcode. [8]

Relocations

PE files normally do not contain position-independent code. Instead they are compiled to a preferred base address , and all addresses emitted by the compiler/linker are fixed ahead of time. If a PE file cannot be loaded at its preferred address (because it's already taken by something else), the operating system will rebase it. This involves recalculating every absolute address and modifying the code to use the new values. The loader does this by comparing the preferred and actual load addresses, and calculating a delta value. This is then added to the preferred address to come up with the new address of the memory location. Base relocations are stored in a list and added, as needed, to an existing memory location. The resulting code is now private to the process and no longer shareable, so many of the memory saving benefits of DLLs are lost in this scenario. It also slows down loading of the module significantly. For this reason rebasing is to be avoided wherever possible, and the DLLs shipped by Microsoft have base addresses pre-computed so as not to overlap. In the no rebase case PE therefore has the advantage of very efficient code, but in the presence of rebasing the memory usage hit can be expensive. This contrasts with ELF where fully position-independent code is usually preferred to load-time relocation, thus trading off execution time in favor of lower memory usage.

.NET, metadata, and the PE format

In a .NET executable, the PE code section contains a stub that invokes the CLR virtual machine startup entry, _CorExeMain or _CorDllMain in mscoree.dll, much like it was in Visual Basic executables. The virtual machine then makes use of .NET metadata present, the root of which, IMAGE_COR20_HEADER (also called "CLR header") is pointed to by IMAGE_DIRECTORY_ENTRY_COMHEADER [9] entry in the PE header's data directory. IMAGE_COR20_HEADER strongly resembles PE's optional header, essentially playing its role for the CLR loader. [4]

The CLR-related data, including the root structure itself, is typically contained in the common code section, .text. It is composed of a few directories: metadata, embedded resources, strong names and a few for native-code interoperability. Metadata directory is a set of tables that list all the distinct .NET entities in the assembly, including types, methods, fields, constants, events, as well as references between them and to other assemblies.

Use on other operating systems

The PE format is also used by ReactOS, as ReactOS is intended to be binary-compatible with Windows. It has also historically been used by a number of other operating systems, including SkyOS and BeOS R3. However, both SkyOS and BeOS eventually moved to ELF.[ citation needed ]

As the Mono development platform intends to be binary compatible with the Microsoft .NET Framework, it uses the same PE format as the Microsoft implementation. The same goes for Microsoft's own cross-platform .NET Core.

On x86(-64) Unix-like operating systems, Windows binaries (in PE format) can be executed with Wine. The HX DOS Extender also uses the PE format for native DOS 32-bit binaries, plus it can, to some degree, execute existing Windows binaries in DOS, thus acting like an equivalent of Wine for DOS.

On IA-32 and x86-64 Linux one can also run Windows' DLLs under load library. [10]

Mac OS X 10.5 has the ability to load and parse PE files, but is not binary compatible with Windows. [11]

UEFI and EFI firmware use Portable Executable files as well as the Windows ABI x64 calling convention for applications.

See also

Related Research Articles

<span class="mw-page-title-main">Executable and Linkable Format</span> Standard file format for executables, object code, shared libraries, and core dumps.

In computing, the Executable and Linkable Format, is a common standard file format for executable files, object code, shared libraries, and core dumps. First published in the specification for the application binary interface (ABI) of the Unix operating system version named System V Release 4 (SVR4), and later in the Tool Interface Standard, it was quickly accepted among different vendors of Unix systems. In 1999, it was chosen as the standard binary file format for Unix and Unix-like systems on x86 processors by the 86open project.

In computing, DLL Hell is a term for the complications that arise when one works with dynamic-link libraries (DLLs) used with Microsoft Windows operating systems, particularly legacy 16-bit editions, which all run in a single memory space.

<span class="mw-page-title-main">FASM</span> Open source assembler for x86 processors

FASM is an assembler for x86 processors. It supports Intel-style assembly language on the IA-32 and x86-64 computer architectures. It claims high speed, size optimizations, operating system (OS) portability, and macro abilities. It is a low-level assembler and intentionally uses very few command-line options. It is free and open-source software.

The Common Object File Format (COFF) is a format for executable, object code, and shared library computer files used on Unix systems. It was introduced in Unix System V, replaced the previously used a.out format, and formed the basis for extended specifications such as XCOFF and ECOFF, before being largely replaced by ELF, introduced with SVR4. COFF and its variants continue to be used on some Unix-like systems, on Microsoft Windows, in UEFI environments and in some embedded development systems.

An object file is a file that contains machine code or bytecode, as well as other data and metadata, generated by a compiler or assembler from source code during the compilation or assembly process. The machine code that is generated is known as object code.

In computer programming, a magic number is any of the following:

In computing, rebasing is the process of modifying data based on one reference to another. It can be one of the following:

.exe is a common filename extension denoting an executable file for Microsoft Windows, OS/2, and DOS.

A fat binary is a computer executable program or library which has been expanded with code native to multiple instruction sets which can consequently be run on multiple processor types. This results in a file larger than a normal one-architecture binary file, thus the name.

<span class="mw-page-title-main">COM file</span> Type of simple executable file

A COM file is a type of simple executable file. On the Digital Equipment Corporation (DEC) VAX operating systems of the 1970s, .COM was used as a filename extension for text files containing commands to be issued to the operating system. With the introduction of Digital Research's CP/M, the type of files commonly associated with COM extension changed to that of executable files. This convention was later carried over to DOS. Even when complemented by the more general EXE file format for executables, the compact COM files remained viable and frequently used under DOS.

<span class="mw-page-title-main">UEFI</span> Operating system and firmware specification

Unified Extensible Firmware Interface is a specification that defines the architecture of the platform firmware used for booting the computer hardware and its interface for interaction with the operating system. Examples of firmware that implement the specification are AMI Aptio, Phoenix SecureCore, TianoCore EDK II, InsydeH2O. UEFI replaces the BIOS which was present in the boot ROM of all personal computers that are IBM PC compatible, although it can provide backwards compatibility with the BIOS using CSM booting. Intel developed the original Extensible Firmware Interface (EFI) specification. Some of the EFI's practices and data formats mirror those of Microsoft Windows. In 2005, UEFI deprecated EFI 1.10.

The Microsoft Macro Assembler (MASM) is an x86 assembler that uses the Intel syntax for MS-DOS and Microsoft Windows. Beginning with MASM 8.0, there are two versions of the assembler: One for 16-bit & 32-bit assembly sources, and another (ML64) for 64-bit sources only.

<span class="mw-page-title-main">UPX</span>

UPX is a free and open source executable packer supporting a number of file formats from different operating systems.

Relocation is the process of assigning load addresses for position-dependent code and data of a program and adjusting the code and data to reflect the assigned addresses. Prior to the advent of multiprocess systems, and still in many embedded systems, the addresses for objects were absolute starting at a known location, often zero. Since multiprocessing systems dynamically link and switch between programs it became necessary to be able to relocate objects using position-independent code. A linker usually performs relocation in conjunction with symbol resolution, the process of searching files and libraries to replace symbolic references or names of libraries with actual usable addresses in memory before running a program.

<span class="mw-page-title-main">GUID Partition Table</span> Computer disk partitioning standard

The GUID Partition Table (GPT) is a standard for the layout of partition tables of a physical computer storage device, such as a hard disk drive or solid-state drive, using universally unique identifiers, which are also known as globally unique identifiers (GUIDs). Forming a part of the Unified Extensible Firmware Interface (UEFI) standard, it is nevertheless also used for some BIOSs, because of the limitations of master boot record (MBR) partition tables, which use 32 bits for logical block addressing (LBA) of traditional 512-byte disk sectors.

Dynamic-link library (DLL) is Microsoft's implementation of the shared library concept in the Microsoft Windows and OS/2 operating systems. These libraries usually have the file extension DLL, OCX, or DRV . The file formats for DLLs are the same as for Windows EXE files – that is, Portable Executable (PE) for 32-bit and 64-bit Windows, and New Executable (NE) for 16-bit Windows. As with EXEs, DLLs can contain code, data, and resources, in any combination.

The New Executable is a 16-bit executable file format, a successor to the DOS MZ executable format. It was used in Windows 1.0–3.x, Windows 9x, multitasking MS-DOS 4.0, OS/2 1.x, and the OS/2 subset of Windows NT up to version 5.0. A NE is also called a segmented executable.

In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed, by copying the content of libraries from persistent storage to RAM, filling jump tables and relocating pointers. The specific operating system and executable format determine how the dynamic linker functions and how it is implemented.

The Object Module Format (OMF) is an object file format used primarily for software intended to run on Intel 80x86 microprocessors. It was originally developed by Intel around 1975–1977 for ISIS-II, targetting the 8-bit 8080/8085 processors. This variant later became known as OMF-80. As OMF-86 it was adapted to the 16-bit 8086 processor in 1978. Version 4.0 for the 8086 family was released in 1981 under the name Relocatable Object Module Format, and is perhaps best known to DOS users as an .OBJ file. Versions for the 80286 (OMF-286) and the 32-bit 80386 processors (OMF-386) were introduced in 1981 and 1985, respectively. It has since been standardized by the Tool Interface Standards Committee and was also extended by Microsoft and IBM (IBM-OMF). Intel also adapted the format to the 8051 microcontroller (OMF-51 and AOMF).

This is a comparison of binary executable file formats which, once loaded by a suitable executable loader, can be directly executed by the CPU rather than being interpreted by software. In addition to the binary application code, the executables may contain headers and tables with relocation and fixup information as well as various kinds of meta data. Among those formats listed, the ones in most common use are PE, ELF, Mach-O and MZ.

References

  1. Andersson, Henrik (2015-04-23). "application/vnd.microsoft.portable-executable". IANA. Retrieved 2017-03-26.
  2. "Portable executable (PE) - Definition - Trend Micro IN". www.trendmicro.com. Retrieved 2022-11-10.
  3. "UEFI Specification, version 2.8B" (PDF)., a note on p.15, states that "this image type is chosen to enable UEFI images to contain Thumb and Thumb2 instructions while defining the EFI interfaces themselves to be in ARM mode."
  4. 1 2 "PE Format (Windows)" . Retrieved 2017-10-21.
  5. E.g. Microsoft's linker has /STUB switch to attach one
  6. In order to know whether the executable code is 32- or 64-bit, check the Machine field in the IMAGE_FILE_HEADER. (PE trick explained: Telling 32 and 64 bit apart with naked eye by Karsten Hahn)
    To see if addresses in the executable are 32- or 64-bit, check the Magic field in the IMAGE_OPTIONAL_HEADER. 10B16 indicates a PE32 file, whereas 20B16 indicates a PE32+ file. (PE Format at Microsoft.com)
  7. "The Portable Executable File From Top to Bottom" . Retrieved 2017-10-21.
  8. 1 2 "Peering Inside the PE: A Tour of the Win32 Portable Executable File" . Retrieved 2017-10-21.
  9. The entry was previously used for COM+ metadata in COM+ applications, hence the name
  10. "GitHub - taviso/Loadlibrary: Porting Windows Dynamic Link Libraries to Linux". GitHub .
  11. Chartier, David (2007-11-30). "Uncovered: Evidence that Mac OS X could run Windows apps soon". Ars Technica. Retrieved 2007-12-03. ... Steven Edwards describes the discovery that Leopard apparently contains an undocumented loader for Portable Executables, a type of file used in 32-bit and 64-bit versions of Windows. More poking around revealed that Leopard's own loader tries to find Windows DLL files when attempting to load a Windows binary.