Netwide Assembler

Last updated

NASM
Original author(s) Simon Tatham, Julian Hall
Developer(s) H. Peter Anvin, Chang Seok Bae, Jim Kukunas, Frank B. Kotler, Cyrill Gorcunov
Initial releaseOctober 1996;27 years ago (1996-10)
Stable release
2.16.01 [1]   OOjs UI icon edit-ltr-progressive.svg / 21 December 2022;13 months ago (21 December 2022)
Repository
Written in Assembly, C [2]
Operating system Unix-like, Windows, OS/2, MS-DOS
Available in English
Type x86 assembler
License BSD 2-clause
Website www.nasm.us

The Netwide Assembler (NASM) is an assembler and disassembler for the Intel x86 architecture. It can be used to write 16-bit, 32-bit (IA-32) and 64-bit (x86-64) programs. It is considered one of the most popular assemblers for Linux and x86 chips. [3]

Contents

It was originally written by Simon Tatham with assistance from Julian Hall. As of 2016, it is maintained by a small team led by H. Peter Anvin. [4] It is open-source software released under the terms of a simplified (2-clause) BSD license. [5]

Features

NASM can output several binary formats, including COFF, OMF, a.out, Executable and Linkable Format (ELF), Mach-O and binary file (.bin, binary disk image, used to compile operating systems), though position-independent code is supported only for ELF object files. It also has its own binary format called RDOFF. [6]

The variety of output formats allows retargeting programs to virtually any x86 operating system (OS). It can also create flat binary files, usable to write boot loaders, read-only memory (ROM) images, and in various facets of OS development. [6] It can run on non-x86 platforms as a cross assembler, such as PowerPC and SPARC, though it cannot generate programs usable by those machines.

NASM uses a variant of Intel assembly syntax instead of AT&T syntax. [7] It also avoids features such as automatic generation of segment overrides (and the related ASSUME directive) used by MASM and compatible assemblers. [6]

Sample programs

A "Hello, world!" program for the DOS operating system:

section.textorg0x100movah,0x9movdx,helloint0x21movax,0x4c00int0x21section.datahello:db'Hello, world!',13,10,'$'

An equivalent program for Linux:

global_startsection.text_start:moveax,4; writemovebx,1; stdoutmovecx,msgmovedx,msg.lenint0x80; write(stdout, msg, strlen(msg));xoreax,msg.len; invert return value from write()xchgeax,ebx; value for exit()moveax,1; exitint0x80; exit(...)section.datamsg:db"Hello, world!",10.len:equ$-msg

An example of a similar program for Microsoft Windows:

global_mainextern_MessageBoxA@16extern_ExitProcess@4sectioncodeuse32class=code_main:pushdword0; UINT uType = MB_OKpushdwordtitle; LPCSTR lpCaptionpushdwordbanner; LPCSTR lpTextpushdword0; HWND hWnd = NULLcall_MessageBoxA@16pushdword0; UINT uExitCodecall_ExitProcess@4sectiondatause32class=databanner:db'Hello, world!',0title:db'Hello',0

A 64-bit program for Apple OS X that inputs a keystroke and shows it on the screen:

global_startsection.dataquery_string:db"Enter a character:  "query_string_len:equ$-query_stringout_string:db"You have input:  "out_string_len:equ$-out_stringsection.bssin_char:resw4section.text_start:movrax,0x2000004; put the write-system-call-code into register raxmovrdi,1; tell kernel to use stdoutmovrsi,query_string; rsi is where the kernel expects to find the address of the messagemovrdx,query_string_len; and rdx is where the kernel expects to find the length of the message syscall; read in the charactermovrax,0x2000003; read system callmovrdi,0; stdinmovrsi,in_char; address for storage, declared in section .bssmovrdx,2; get 2 bytes from the kernel's buffer (one for the carriage return)syscall; show user the outputmovrax,0x2000004; write system callmovrdi,1; stdoutmovrsi,out_stringmovrdx,out_string_lensyscallmovrax,0x2000004; write system callmovrdi,1; stdoutmovrsi,in_charmovrdx,2; the second byte is to apply the carriage return expected in the stringsyscall; exit system callmovrax,0x2000001; exit system callxorrdi,rdisyscall

Linking

NASM principally outputs object files, which are generally not executable by themselves. The only exception to this are flat binaries (e.g., .COM) [6] which are inherently limited in modern use. To translate the object files into executable programs, an appropriate linker must be used, such as the Visual Studio "LINK" utility for Windows or ld for Unix-like systems.

Development

NASM version 0.90 was released in October 1996. [5]

Version 2.00 was released on 28 November 2007, adding support for x86-64 extensions. [4] The development versions are not uploaded to SourceForge.net, but are checked into GitHub with binary snapshots available from the project web page.

A search engine for NASM documentation is also available. [8]

In July 2009, as of version 2.07, NASM was released under the Simplified (2-clause) BSD license. Previously, because it was licensed under LGPL, it led to development of Yasm, a complete rewrite of under the New BSD License. Yasm offered support for x86-64 earlier than NASM. It also added support for GNU Assembler syntax.

RDOFF

Relocatable Dynamic Object File Format (RDOFF) is used by developers to test the integrity of NASM's object file output abilities. It is based heavily on the internal structure of NASM, [9] essentially consisting of a header containing a serialization of the output driver function calls followed by an array of sections containing executable code or data. Tools for using the format, including a linker and loader, are included in the NASM distribution.

Until version 0.90 was released in October 1996, NASM supported output of only flat-format executable files (e.g., DOS COM files). In version 0.90, Simon Tatham added support for an object-file output interface, and for DOS .OBJ files for 16-bit code only. [10]

NASM thus lacked a 32-bit object format. To address this lack, and as an exercise to learn the object-file interface, developer Julian Hall put together the first version of RDOFF, which was released in NASM version 0.91. [10]

Since this initial version, there has been one major update to the RDOFF format, which added a record-length indicator on each header record, [11] allowing programs to skip over records whose format they do not recognise, and support for multiple segments; RDOFF1 only supported three segments: text, data and bss (containing uninitialized data). [9]

The RDOFF format is strongly deprecated and has been disabled starting in NASM 2.15.04. [12]

See also

Related Research Articles

A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. Disassembly, the output of a disassembler, is often formatted for human-readability rather than suitability for input to an assembler, making it principally a reverse-engineering tool. Common uses of disassemblers include analyzing high-level programing language compilers output and their optimizations, recovering source code of a program whose original source was lost, malware analysis, modifying software, and software cracking.

A low-level programming language is a programming language that provides little or no abstraction from a computer's instruction set architecture—commands or functions in the language map that are structurally similar to processor's instructions. Generally, this refers to either machine code or assembly language. Because of the low abstraction between the language and machine language, low-level languages are sometimes described as being "close to the hardware". Programs written in low-level languages tend to be relatively non-portable, due to being optimized for a certain type of system architecture.

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.

In computing, a bus error is a fault raised by hardware, notifying an operating system (OS) that a process is trying to access memory that the CPU cannot physically address: an invalid address for the address bus, hence the name. In modern use on most architectures these are much rarer than segmentation faults, which occur primarily due to memory access violations: problems in the logical address or permissions.

In computing, vectored I/O, also known as scatter/gather I/O, is a method of input and output by which a single procedure call sequentially reads data from multiple buffers and writes it to a single data stream (gather), or reads data from a data stream and writes it to multiple buffers (scatter), as defined in a vector of buffers. Scatter/gather refers to the process of gathering data from, or scattering data into, the given set of buffers. Vectored I/O can operate synchronously or asynchronously. The main reasons for using vectored I/O are efficiency and convenience.

High Level Assembly (HLA) is a language developed by Randall Hyde that allows the use of higher-level language constructs to aid both beginners and advanced assembly developers. It fully supports advanced data types and object-oriented programming. It uses a syntax loosely based on several high-level programming languages (HLLs), such as Pascal, Ada, Modula-2, and C++, to allow the creation of readable assembly language programs, and to allow HLL programmers to learn HLA as fast as possible.

Turbo Assembler is an assembler for software development published by Borland in 1989. It runs on and produces code for 16- or 32-bit x86 MS-DOS and compatibles or Microsoft Windows. It can be used with Borland's other language products: Turbo Pascal, Turbo Basic, Turbo C, and Turbo C++. The Turbo Assembler package is bundled with Turbo Linker and is interoperable with Turbo Debugger.

The GNU Assembler, commonly known as gas or as, is the assembler developed by the GNU Project. It is the default back-end of GCC. It is used to assemble the GNU operating system and the Linux kernel, and various other software. It is a part of the GNU Binutils package.

MACRO-11 is an assembly language with macro facilities, designed for PDP-11 minicomputer family from Digital Equipment Corporation (DEC). It is the successor to Program Assembler Loader (PAL-11R), an earlier version of the PDP-11 assembly language without macro facilities.

A86 is an assembler for MS-DOS which generates code for the Intel x86 family of microprocessors. Written by Eric Isaacson, it was first made available as shareware in June 1986. The assembler is contained in one 32K executable and can directly produce a COM file or an object file for use with a standard linker. It comes with a debugger, D86.

In the x86 architecture, the CPUID instruction is a processor supplementary instruction allowing software to discover details of the processor. It was introduced by Intel in 1993 with the launch of the Pentium and SL-enhanced 486 processors.

objdump is a command-line program for displaying various information about object files on Unix-like operating systems. For instance, it can be used as a disassembler to view an executable in assembly form. It is part of the GNU Binutils for fine-grained control over executables and other binary data. objdump uses the BFD library to read the contents of object files. Similar utilities are Borland TDUMP, Microsoft DUMPBIN and readelf.

On many computer operating systems, a computer process terminates its execution by making an exit system call. More generally, an exit in a multithreading environment means that a thread of execution has stopped running. For resource management, the operating system reclaims resources that were used by the process. The process is said to be a dead process after it terminates.

crt0 is a set of execution startup routines linked into a C program that performs any initialization work required before calling the program's main function.

This article describes the calling conventions used when programming x86 architecture microprocessors.

<span class="mw-page-title-main">Cosmos (operating system)</span> Toolkit for building GUI and command-line based operating systems

C# Open Source Managed Operating System (Cosmos) is a toolkit for building GUI and command-line based operating systems, written mostly in the programming language C# and small amounts of a high level assembly language named X#. Cosmos is a backronym, in that the acronym was chosen before the meaning. It is open-source software released under a BSD license.

A decompiler is a computer program that translates an executable file to high-level source code. It does therefore the opposite of a typical compiler, which translates a high-level language to a low-level language. While disassemblers translate an executable into assembly language, decompilers go a step further and translate the code into a higher level language such as C or Java, requiring more sophisticated techniques. Decompilers are usually unable to perfectly reconstruct the original source code, thus will frequently produce obfuscated code. Nonetheless, they remain an important tool in the reverse engineering of computer software.

<span class="mw-page-title-main">LLDB (debugger)</span> Software debugger

The LLDB Debugger (LLDB) is the debugger component of the LLVM project. It is built as a set of reusable components which extensively use existing libraries from LLVM, such as the Clang expression parser and LLVM disassembler. LLDB is free and open-source software under the University of Illinois/NCSA Open Source License, a BSD-style permissive software license. Since v9.0.0, it was relicensed to the Apache License 2.0 with LLVM Exceptions.

Blind return oriented programming (BROP) is an exploit technique which can successfully create an exploit even if the attacker does not possess the target binary. BROP attacks shown by Bittau et al. have defeated address space layout randomization (ASLR) and stack canaries on 64-bit systems.

The Alternate Instruction Set (AIS) is a second 32-bit instruction set architecture found in some x86 CPUs made by VIA Technologies. On these VIA C3 processors, the second hidden processor mode is accessed by executing the x86 instruction JMPAI. If AIS mode has been enabled, the processor will perform a JMP EAX and begin executing AIS instructions at the address of the EAX register. Using AIS allows native access to the Centaur Technology-designed RISC core inside the processor.

References

  1. "Index of /pub/nasm/releasebuilds/2.16.01".
  2. "NASM, the Netwide Assembler". GitHub . 25 October 2021.
  3. Ram Narayan. "Linux assemblers: A comparison of GAS and NASM". IBM . Archived from the original on 3 October 2013. two of the most popular assemblers for Linux, GNU Assembler (GAS) and Netwide Assembler (NASM)
  4. 1 2 "The Netwide Assembler" . Retrieved 27 June 2008.
  5. 1 2 "NASM Version History" . Retrieved 3 August 2019.
  6. 1 2 3 4 "NASM Manual". Archived from the original on 23 February 2009. Retrieved 15 August 2009.
  7. Randall Hyde. "NASM: The Netwide Assembler". Archived from the original on 12 September 2010. Retrieved 27 June 2008.
  8. "NASM Doc Search Engine". Archived from the original on 23 January 2010. Retrieved 14 September 2009.
  9. 1 2 "NASM Manual Ch. 6" . Retrieved 27 June 2008.
  10. 1 2 "NASM CVS". 8 June 2008. Retrieved 27 June 2008.
  11. "V1-V2.txt". 4 December 2002. Retrieved 27 June 2008.
  12. "Relocatable Dynamic Object File Format (deprecated)".

Further reading