This article needs additional citations for verification .(February 2009) |
Filename extension | .COM |
---|---|
Internet media type | application/x-dosexec |
Type of format | Executable |
Extended to | DOS MZ executable |
A COM file is a type of simple executable file. On the Digital Equipment Corporation (DEC) VAX operating systems of the 1970s, .COM
was used as a filename extension for text files containing commands to be issued to the operating system (similar to a batch file). [1] With the introduction of Digital Research's CP/M (a microcomputer operating system), the type of files commonly associated with COM extension changed to that of executable files. This convention was later carried over to DOS. Even when complemented by the more general EXE file format for executables, the compact COM files remained viable and frequently used under DOS.
The .COM
file name extension has no relation to the .com (for "commercial") top-level Internet domain name. However, this similarity in name has been exploited by malware writers.
The COM format is the original binary executable format used in CP/M (including SCP and MSX-DOS) as well as DOS. It is very simple; it has no header (with the exception of CP/M 3 files), [2] and contains no standard metadata, only code and data. This simplicity exacts a price: the binary has a maximum size of 65,280 (FF00h) bytes (256 bytes short of 64 KB) and stores all its code and data in one segment.
Since it lacks relocation information, it is loaded by the operating system at a pre-set address, at offset 0100h immediately following the PSP, where it is executed (hence the limitation of the executable's size): the entry point is fixed at 0100h. [nb 1] This was not an issue on 8-bit machines since they can address 64k of memory max, but 16-bit machines have a much larger address space, which is why the format fell out of use.
In the Intel 8080 CPU architecture, only 65,536 bytes of memory could be addressed (address range 0000h to FFFFh). Under CP/M, the first 256 bytes of this memory, from 0000h to 00FFh were reserved for system use by the zero page, and any user program had to be loaded at exactly 0100h to be executed. [nb 1] COM files fit this model perfectly. Before the introduction of MP/M and Concurrent CP/M, there was no possibility of running more than one program or command at a time: the program loaded at 0100h was run, and no other.
Although the file format is the same in DOS and CP/M, .COM files for the two operating systems are not compatible; DOS COM files contain x86 instructions and possibly DOS system calls, while CP/M COM files contain 8080 instructions and CP/M system calls (programs restricted to certain machines could also contain additional instructions for 8085 or Z80).
.COM files in DOS set all x86 segment registers to the same value and the SP (stack pointer) register to the offset of the last word available in the first 64 KB segment (typically FFFEh) or the maximum size of memory available in the block the program is loaded into for both, the program plus at least 256 bytes stack, whatever is smaller, thus the stack begins at the very top of the corresponding memory segment and works down from there. [3] [4]
In the original DOS 1.x API, which was a derivative of the CP/M API, program termination of a .COM file would be performed by calling the INT 20h (Terminate Program) function or else INT 21h Function 0, which served the same purpose, and the programmer also had to ensure that the code and data segment registers contained the same value at program termination to avoid a potential system crash. Although this could be used in any DOS version, Microsoft recommended the use of INT 21h Function 4Ch for program termination from DOS 2.x onward, which did not require the data and code segment to be set to the same value.
It is possible to make a .COM file to run under both operating systems in form of a fat binary. There is no true compatibility at the instruction level; the instructions at the entry point are chosen to be equal in functionality but different in both operating systems, and make program execution jump to the section for the operating system in use. It is basically two different programs with the same functionality in a single file, preceded by code selecting the one to use.
Under CP/M 3, if the first byte of a COM file is C9h, there is a 256-byte header; [2] since C9h corresponds to the 8080 instruction RET
, this means that the COM file will immediately terminate if run on an earlier version of CP/M that does not support this extension. (Because the instruction sets of the 8085 and Z80 are supersets of the 8080 instruction set, this works on all three processors.) C9h is an invalid opcode on the 8088/8086, and it will cause a processor-generated interrupt 6 exception in v86 mode on the 386 and later x86 chips. Since C9h is the opcode for LEAVE since the 80188/80186 and therefore not used as the first instruction in a valid program, the executable loader in some versions of DOS rejects COM files that start with C9h, avoiding a crash.
Files may have names ending in .COM, but not be in the simple format described above; this is indicated by a magic number at the start of the file. For example, the COMMAND.COM file in DR DOS 6.0 is actually in DOS executable format, indicated by the first two bytes being MZ (4Dh 5Ah), the initials of Mark Zbikowski.
Under DOS there is no memory management provided for COM files by the loader or execution environment. All memory is simply available to the COM file. After execution, the operating system command shell, COMMAND.COM, is reloaded. This leaves the possibilities that the COM file can either be very simple, using a single segment, or arbitrarily complex, providing its own memory management system. An example of a complex program is COMMAND.COM, the DOS shell, which provided a loader to load other COM or EXE programs. In the .COM system, larger programs (up to the available memory size) can be loaded and run, but the system loader assumes that all code and data is in the first segment, and it is up to the .COM program to provide any further organization. Programs larger than available memory, or large data segments, can be handled by dynamic linking, if the necessary code is included in the .COM program. The advantage of using the .COM rather than .EXE format is that the binary image is usually smaller and easier to program using an assembler. [5] Once compilers and linkers of sufficient power became available, it was no longer advantageous to use the .COM format for complex programs.
The format is still executable on many modern Windows NT-based platforms, but it is run in an MS-DOS-emulating subsystem, NTVDM, which is not present in 64-bit variants. COM files can be executed also on DOS emulators such as DOSBox, on any platform supported by these emulators.
Windows NT-based operating systems use the .com extension for a small number of commands carried over from MS-DOS days although they are in fact presently implemented as .exe files. The operating system will recognize the .exe file header and execute them correctly despite their technically incorrect .com extension. (In fact any .exe file can be renamed .com and still execute correctly.) The use of the original .com extensions for these commands ensures compatibility with older DOS batch files that may refer to them with their full original filenames. These commands are CHCP
, DISKCOMP
, DISKCOPY
, FORMAT
, MODE
, MORE
and TREE
. [6]
In DOS, if a directory contains both a COM file and an EXE file with same name, when no extension is specified the COM file is preferentially selected for execution. For example, if a directory in the system path contains two files named foo.com
and foo.exe
, the following would execute foo.com
:
C:\>foo
A user wishing to run foo.exe
can explicitly use the complete filename:
C:\>foo.exe
Taking advantage of this default behaviour, virus writers and other malicious programmers have used names like notepad.com
for their creations, hoping that if it is placed in the same directory as the corresponding EXE file, a command or batch file may accidentally trigger their program instead of the text editor notepad.exe
. Again, these .com files may in fact contain a .exe format executable.
On Windows NT and derivatives (Windows 2000, Windows XP, Windows Vista, and Windows 7), the PATHEXT variable is used to override the order of preference (and acceptable extensions) for calling files without specifying the extension from the command line. The default value still places .com
files before .exe
files. This closely resembles a feature previously found in JP Software's line of extended command line processors 4DOS, 4OS2, and 4NT.
Some computer virus writers have hoped to take advantage of modern computer users' likely lack of knowledge of the .com file extension and associated binary format, along with their more likely familiarity with the .com Internet domain name. E-mails have been sent with attachment names similar to "www.example.com". Unwary Microsoft Windows users clicking on such an attachment would expect to begin browsing a site named http://www.example.com/
, but instead would run the attached binary command file named www.example
, giving it full permission to do to their machine whatever its author had in mind.[ citation needed ]
There is nothing malicious about the COM file format itself; this is an exploitation of the coincidental name collision between .com command files and .com commercial web sites.
CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/85-based microcomputers by Gary Kildall of Digital Research, Inc. CP/M is a disk operating system and its purpose is to organize files on a magnetic storage medium, and to load and run programs stored on a disk. Initially confined to single-tasking on 8-bit processors and no more than 64 kilobytes of memory, later versions of CP/M added multi-user variations and were migrated to 16-bit processors.
The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems, and in UEFI environments. The PE format is a data structure that encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code. This includes dynamic library references for linking, API export and import tables, resource management data and thread-local storage (TLS) data. On NT operating systems, the PE format is used for EXE, DLL, SYS, MUI and other file types. The Unified Extensible Firmware Interface (UEFI) specification states that PE is the standard executable format in EFI environments.
x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.
The A20, or address line 20, is one of the electrical lines that make up the system bus of an x86-based computer system. The A20 line in particular is used to transmit the 21st bit on the address bus.
For Microsoft Windows, OS/2, and DOS, .exe is the filename extension that denotes a file as being executable – a computer program – containing an entry point.
A fat binary is a computer executable program or library which has been expanded with code native to multiple instruction sets which can consequently be run on multiple processor types. This results in a file larger than a normal one-architecture binary file, thus the name.
The Program Segment Prefix (PSP) is a data structure used in DOS systems to store the state of a program. It resembles the Zero Page in the CP/M operating system. The PSP has the following structure:
A File Control Block (FCB) is a file system structure in which the state of an open file is maintained. A FCB is managed by the operating system, but it resides in the memory of the program that uses the file, not in operating system memory. This allows a process to have as many files open at one time as it wants, provided it can spare enough memory for an FCB per file.
Relocation is the process of assigning load addresses for position-dependent code and data of a program and adjusting the code and data to reflect the assigned addresses. Prior to the advent of multiprocess systems, and still in many embedded systems, the addresses for objects are absolute starting at a known location, often zero. Since multiprocessing systems dynamically link and switch between programs it became necessary to be able to relocate objects using position-independent code. A linker usually performs relocation in conjunction with symbol resolution, the process of searching files and libraries to replace symbolic references or names of libraries with actual usable addresses in memory before running a program.
Easy_Kernal MTK In CP/M-86, Concurrent CP/M-86, Personal CP/M-86, S5-DOS, DOS Plus, Concurrent DOS, FlexOS, Multiuser DOS, System Manager and REAL/32 as well as by SCP1700, CP/K and K8918-OS, CMD is the filename extension used by CP/M-style executable programs. It corresponds to COM in CP/M-80 and EXE in DOS. The same extension is used by the command-line interpreter CMD.EXE in OS/2 and Windows for batch files.
A program information file (PIF) defines how a given DOS program should be run in a multi-tasking environment, especially in order to avoid giving it unnecessary resources which could remain available to other programs. TopView was the originator of PIFs; they were then inherited and extended by DESQview and Microsoft Windows, where they are most often seen. PIFs are seldom used today in software due to the absence of DOS applications.
In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed, by copying the content of libraries from persistent storage to RAM, filling jump tables and relocating pointers. The specific operating system and executable format determine how the dynamic linker functions and how it is implemented.
The DOS MZ executable format is the executable file format used for .EXE files in DOS.
A source-to-source translator, source-to-source compiler, transcompiler, or transpiler is a type of translator that takes the source code of a program written in a programming language as its input and produces an equivalent source code in the same or a different programming language. A source-to-source translator converts between programming languages that operate at approximately the same level of abstraction, while a traditional compiler translates from a higher level programming language to a lower level programming language. For example, a source-to-source translator may perform a translation of a program from Python to JavaScript, while a traditional compiler translates from a language like C to assembly or Java to bytecode. An automatic parallelizing compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations or language constructs.
MSX-DOS is a discontinued disk operating system developed by Microsoft's Japan subsidiary for the 8-bit home computer standard MSX, and is a cross between MS-DOS v1.25 and CP/M-80 v2.2.
The line-oriented debugger DEBUG.EXE
is an external command in operating systems such as DOS, OS/2 and Windows.
In a general computing sense, overlaying means "the process of transferring a block of program code or other data into main memory, replacing what is already stored". Overlaying is a programming method that allows programs to be larger than the computer's main memory. An embedded system would normally use overlays because of the limitation of physical memory, which is internal memory for a system-on-chip, and the lack of virtual memory facilities.
A batch file is a script file in DOS, OS/2 and Microsoft Windows. It consists of a series of commands to be executed by the command-line interpreter, stored in a plain text file. A batch file may contain any command the interpreter accepts interactively and use constructs that enable conditional branching and looping within the batch file, such as IF
, FOR
, and GOTO
labels. The term "batch" is from batch processing, meaning "non-interactive execution", though a batch file might not process a batch of multiple data.
In computer programming, a self-relocating program is a program that relocates its own address-dependent instructions and data when run, and is therefore capable of being loaded into memory at any address. In many cases, self-relocating code is also a form of self-modifying code.
DOS is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible systems from other manufacturers include DR-DOS (1988), ROM-DOS (1989), PTS-DOS (1993), and FreeDOS (1998). MS-DOS dominated the IBM PC compatible market between 1981 and 1995.