Portable Executable

Last updated
Portable Executable
Filename extension
.acm, .ax, .cpl, .dll, .drv, .efi, .exe , .mui, .ocx, .scr, .sys, .tsp, .mun, .msstyles
Internet media type
application/vnd.microsoft.portable-executable [1]
Developed byCurrently: Microsoft
Type of format Binary, executable, object, shared libraries
Extended from DOS MZ executable
COFF

The Portable Executable (PE) format is a file format for executables, object code, dynamic-link-libraries (DLLs), and binary files used on 32-bit and 64-bit Windows operating systems, as well as in UEFI environments. [2] It is the standard format for executables on Windows NT-based systems, including files such as .exe, .dll, .sys (for system drivers), and .mui. At its core, the PE format is a structured data container that gives the Windows operating system loader everything it needs to properly manage the executable code it contains. This includes references for dynamically linked libraries, tables for importing and exporting API s, resource management data and thread-local storage (TLS) information.

Contents

According to the Unified Extensible Firmware Interface (UEFI) specification, the PE format is also the accepted standard for executables in EFI environments. [3] On Windows NT systems, it currently supports a range of instruction sets, including IA-32, x86-64 (AMD64/Intel 64), IA-64, ARM and ARM64. Before the advent of Windows 2000, Windows NT (and by extension the PE format) also supported MIPS, Alpha, and PowerPC architectures. Moreover, thanks to its use in Windows CE, PE has maintained compatibility with several MIPS, ARM (including Thumb), and SuperH variants. [4]

Functionally, the PE format is similar to other platform-specific executable formats, such as the ELF format used in Linux and most Unix-like systems, and the Mach-O format found in macOS and iOS.

History

Microsoft first introduced the PE format with Windows NT 3.1, replacing the older 16-bit New Executable (NE) format. Soon after, Windows 95, 98, ME, and the Win32s extension for Windows 3.1x, all adopted the PE structure. Each PE file includes a DOS executable header, which generally displays the message "This program cannot be run in DOS mode". However, this DOS section can be replaced by a fully functional DOS program, as demonstrated in the Windows 98 SE installer. Developers can add such a program using the /STUB switch with Microsoft's linker, effectively creating a fat binary. [5]

Over time, the PE format has grown with the Windows platform. Notable extensions include the .NET PE format for managed code, PE32+ for 64-bit address space support, and a specialized version for Windows CE.

To determine whether a PE file is intended for 32-bit or 64-bit architectures, one can examine the Machine field in the IMAGE_FILE_HEADER. [6] Common machine values are 0x014c for 32-bit Intel processors and 0x8664 for x64 processors. Additionally, the Magic field in the IMAGE_OPTIONAL_HEADER reveals whether addresses are 32-bit or 64-bit. A value of 0x10B indicates a 32-bit (PE32) file, while 0x20B indicates a 64-bit (PE32+) file. [7]

Technical details

Layout

Structure of a Portable Executable 32 bit Portable Executable 32 bit Structure in SVG fixed.svg
Structure of a Portable Executable 32 bit

A PE file consists of a several headers and sections that instruct the dynamic linker about on how to map the file into memory. An executable image consists of several different regions, each requiring different memory protection attributres. To ensure proper alignment, the start of each section must align to a page boundary. [8] For instance, the .text section, which contains program code, is typically mapped as an execute/read-only. Conversely, the .data section, which holds global variables, is mapped as no-execute/read write. However, to conserve space, sections are not aligned on disk in this manner. The dynamic linker maps each section to memory individually and assigns the correct permissions based on the information in the headers. [9]

Import table

The import address table (IAT) is used as a lookup table when the application calls a function in a different module. The imports can be specified by ordinal or by name. Because a compiled program cannot know the memory locations of its dependent libraries beforehand, an indirect jump is necessary for API calls. As the dynamic linker holds modules and resolves dependancies, it populates the IAT slots with actual addresses of the corresponding library functions. Although this adds an extra jump, incurring a performance penalty compared to intermodular calls, it minimizes the number of memory pages that that require copy-on-write changes, thus conserving memory and disk I/O. If a call is known to be intermodular beforehand (if indicated by a dllimport attribute), the compiler can generate optimized code with a simple indirect call opcode. [9]

Address Space Layout Randomization (ASLR)

PE files aren't position-independent by default; they are compiled to run at a specific, fixed memory address. Modern operating systems use Address Space Layout Randomization (ASLR) to make it harder for attackers to exploit memory-related vulnerabilities. ASLR works by randomly changing the memory address of important parts of the program every time it's loaded. This includes the base address of the program itself, shared libraries (DLLs), and memory areas like the heap and stack. as a defense mechanism against memory-based exploits. ASLR rearranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap and libraries. By randomizing these memory addresses each time the process an application is loaded, ASLR prevents attackers from being able to reliably predict memory locations.

.NET, metadata, and the PE format

In a .NET executable, the PE code section contains a stub that invokes the CLR virtual machine startup entry, _CorExeMain or _CorDllMain in mscoree.dll, much like it was in Visual Basic executables. The virtual machine then makes use of .NET metadata present, the root of which, IMAGE_COR20_HEADER (also called "CLR header") is pointed to by IMAGE_DIRECTORY_ENTRY_COMHEADER (the entry was previously used for COM+ metadata in COM+ applications, hence the name[ citation needed ]) entry in the PE header's data directory. IMAGE_COR20_HEADER strongly resembles PE's optional header, essentially playing its role for the CLR loader. [4]

The CLR-related data, including the root structure itself, is typically contained in the common code section, .text. It is composed of a few directories: metadata, embedded resources, strong names and a few for native-code interoperability. Metadata directory is a set of tables that list all the distinct .NET entities in the assembly, including types, methods, fields, constants, events, as well as references between them and to other assemblies.

Use on other operating systems

The PE format is also used by ReactOS, an open-source operating system created to be binary-compatible with Windows. Historically, it has also been used by other operating systems such as SkyOS and BeOS R3. However, both SkyOS and BeOS eventually moved to ELF.[ citation needed ]

The Mono development platform, which aims to be binary compatible with the Microsoft .NET Framework, uses the same PE format as the Microsoft implementation. The same goes for Microsoft's own cross-platform .NET Core.

On x86(-64) Unix-like operating systems, Windows binaries (in PE format) can be executed using Wine. The HX DOS Extender also uses the PE format for native DOS 32-bit binaries, and can execute some Windows binaries in DOS, thus acting like an equivalent of Wine for DOS.

Mac OS X 10.5 has the ability to load and parse PE files, although it does not maintain binary compatibility with Windows. [10]

UEFI and EFI firmware use PE files as well as the Windows ABI x64 calling convention for applications.

See also

Related Research Articles

<span class="mw-page-title-main">Executable and Linkable Format</span> Standard file format for executables, object code, shared libraries, and core dumps.

In computing, the Executable and Linkable Format is a common standard file format for executable files, object code, shared libraries, and core dumps. First published in the specification for the application binary interface (ABI) of the Unix operating system version named System V Release 4 (SVR4), and later in the Tool Interface Standard, it was quickly accepted among different vendors of Unix systems. In 1999, it was chosen as the standard binary file format for Unix and Unix-like systems on x86 processors by the 86open project.

<span class="mw-page-title-main">Windows API</span> Microsofts core set of application programming interfaces on Windows

The Windows API, informally WinAPI, is the foundational application programming interface (API) that allows a computer program to access the features of the Microsoft Windows operating system in which the program is running. Programs access API functionality via dynamic-link library (DLL) technology.

<span class="mw-page-title-main">FASM</span> Open source assembler for x86 processors

FASM is an assembler for x86 processors. It supports Intel-style assembly language on the IA-32 and x86-64 computer architectures. It claims high speed, size optimizations, operating system (OS) portability, and macro abilities. It is a low-level assembler and intentionally uses very few command-line options. It is free and open-source software.

The Common Object File Format (COFF) is a format for executable, object code, and shared library computer files used on Unix systems. It was introduced in Unix System V, replaced the previously used a.out format, and formed the basis for extended specifications such as XCOFF and ECOFF, before being largely replaced by ELF, introduced with SVR4. COFF and its variants continue to be used on some Unix-like systems, on Microsoft Windows, in UEFI environments and in some embedded development systems.

An object file is a file that contains machine code or bytecode, as well as other data and metadata, generated by a compiler or assembler from source code during the compilation or assembly process. The machine code that is generated is known as object code.

In computer programming, a magic number is any of the following:

In computing, position-independent code (PIC) or position-independent executable (PIE) is a body of machine code that executes properly regardless of its memory address. PIC is commonly used for shared libraries, so that the same library code can be loaded at a location in each program's address space where it does not overlap with other memory in use by, for example, other shared libraries. PIC was also used on older computer systems that lacked an MMU, so that the operating system could keep applications away from each other even within the single address space of an MMU-less system.

For Microsoft Windows, OS/2, and DOS, .exe is the filename extension that denotes a file as being executable – a computer program – containing an entry point.

A fat binary is a computer executable program or library which has been expanded with code native to multiple instruction sets which can consequently be run on multiple processor types. This results in a file larger than a normal one-architecture binary file, thus the name.

<span class="mw-page-title-main">COM file</span> Type of simple executable file

A COM file is a type of simple executable file. On the Digital Equipment Corporation (DEC) VAX operating systems of the 1970s, .COM was used as a filename extension for text files containing commands to be issued to the operating system. With the introduction of Digital Research's CP/M, the type of files commonly associated with COM extension changed to that of executable files. This convention was later carried over to DOS. Even when complemented by the more general EXE file format for executables, the compact COM files remained viable and frequently used under DOS.

Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from reliably redirecting code execution to, for example, a particular exploited function in memory, ASLR randomly arranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap and libraries.

The Microsoft Macro Assembler (MASM) is an x86 assembler that uses the Intel syntax for MS-DOS and Microsoft Windows. Beginning with MASM 8.0, there are two versions of the assembler: One for 16-bit & 32-bit assembly sources, and another (ML64) for 64-bit sources only.

<span class="mw-page-title-main">UPX</span> Executable packer software

UPX is a free and open source executable packer supporting a number of file formats from different operating systems.

Relocation is the process of assigning load addresses for position-dependent code and data of a program and adjusting the code and data to reflect the assigned addresses. Prior to the advent of multiprocess systems, and still in many embedded systems, the addresses for objects are absolute starting at a known location, often zero. Since multiprocessing systems dynamically link and switch between programs it became necessary to be able to relocate objects using position-independent code. A linker usually performs relocation in conjunction with symbol resolution, the process of searching files and libraries to replace symbolic references or names of libraries with actual usable addresses in memory before running a program.

A dynamic-link library (DLL) is a shared library in the Microsoft Windows or OS/2 operating system.

The New Executable is a 16-bit executable file format, a successor to the DOS MZ executable format. It was used in Windows 1.0–3.x, Windows 9x, multitasking MS-DOS 4.0, OS/2 1.x, and the OS/2 subset of Windows NT up to version 5.0. An NE is also called a segmented executable. It utilizes the 286 protected mode or unreal mode.

In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed, by copying the content of libraries from persistent storage to RAM, filling jump tables and relocating pointers. The specific operating system and executable format determine how the dynamic linker functions and how it is implemented.

The DOS MZ executable format is the executable file format used for .EXE files in DOS.

In computer security, executable-space protection marks memory regions as non-executable, such that an attempt to execute machine code in these regions will cause an exception. It makes use of hardware features such as the NX bit, or in some cases software emulation of those features. However, technologies that emulate or supply an NX bit will usually impose a measurable overhead while using a hardware-supplied NX bit imposes no measurable overhead.

Open Watcom Assembler or WASM is an x86 assembler produced by Watcom, based on the Watcom Assembler found in Watcom C/C++ compiler and Watcom FORTRAN 77. Further development is being done on the 32- and 64-bit JWASM project, which more closely matches the syntax of Microsoft's assembler.

References

  1. Andersson, Henrik (2015-04-23). "application/vnd.microsoft.portable-executable". IANA. Retrieved 2017-03-26.
  2. "Portable executable (PE) - Definition - Trend Micro IN". www.trendmicro.com. Retrieved 2022-11-10.
  3. "UEFI Specification, version 2.8B" (PDF)., a note on p.15, states that "this image type is chosen to enable UEFI images to contain Thumb and Thumb2 instructions while defining the EFI interfaces themselves to be in ARM mode."
  4. 1 2 "PE Format (Windows)" . Retrieved 2017-10-21.
  5. "/STUB (MS-DOS Stub File Name)". 3 August 2021.
  6. PE trick explained: Telling 32 and 64 bit apart with naked eye by Karsten Hahn
  7. PE Format at Microsoft.com
  8. "The Portable Executable File From Top to Bottom" . Retrieved 2017-10-21.
  9. 1 2 "Peering Inside the PE: A Tour of the Win32 Portable Executable File". 30 June 2010. Retrieved 2017-10-21.
  10. Chartier, David (2007-11-30). "Uncovered: Evidence that Mac OS X could run Windows apps soon". Ars Technica. Retrieved 2007-12-03. ... Steven Edwards describes the discovery that Leopard apparently contains an undocumented loader for Portable Executables, a type of file used in 32-bit and 64-bit versions of Windows. More poking around revealed that Leopard's own loader tries to find Windows DLL files when attempting to load a Windows binary.