NOP (code)

Last updated

In computer science, a NOP, no-op, or NOOP (pronounced "no op"; short for no operation) is a machine language instruction and its assembly language mnemonic, programming language statement, or computer protocol command that does nothing.

Contents

Machine language instructions

Some computer instruction sets include an instruction whose explicit purpose is to not change the state of any of the programmer-accessible registers, status flags, or memory. It often takes a well-defined number of clock cycles to execute. In other instruction sets, there is no explicit NOP instruction, but the assembly language mnemonic NOP represents an instruction which acts as a NOP; e.g., on the SPARC, sethi 0, %g0.

A NOP must not access memory, as that could cause a memory fault or page fault.

A NOP is most commonly used for timing purposes, to force memory alignment, to prevent hazards, to occupy a branch delay slot, to render void an existing instruction such as a jump, as a target of an execute instruction, or as a place-holder to be replaced by active instructions later on in program development (or to replace removed instructions when reorganizing would be problematic or time-consuming). In some cases, a NOP can have minor side effects; for example, on the Motorola 68000 series of processors, the NOP opcode causes a synchronization of the pipeline. [1]

Listed below are the NOP instruction for some CPU architectures:

CPU architectureInstruction mnemonic Bytes Opcode Notes
Intel x86 CPU familyNOP1; 1–9 for i686 and x86-64 0x90 [2] 0x90 decodes to "NOP".

Many people believe, that "xchg rax,rax" translates as "NOP"(0x90), but this demonstrably wrong, since the translation is mssig the ModR/M byte encoding the registers.


Instead "xchg rax,rax" translates to "0x4887c0", with "48" as the prefix for 64-bit numbers, "87" as the opcode for "xchg" and "c0" as the ModR/M of 2 identical registers of index 0, which means "rax".

Most Aseemblers are going to encode "xchg rax,rax" into "0x90" though, both for optimization(`xchg` takes more execution time then `nop`), as well as because they do the same.

In this wise, "0x4890" or "0x90"(on 32bit) don't decode as "xchg rax,rax" in 64 bit mode or "xchg eax,eax" in 32bit mode.

Since the assembler knows about the behaviour of the instructions, it does make the replacement, causing all the confusion in people. To actualy get "xchg rax,rax" into your program, the opcodes have to be manually inserted, which then results in an instruction that is equivalent to, though not identical with, "NOP".

Intel 8051 / MCS-51 familyNOP10x00
DEC Alpha NOP40x47FF041FOpcode for BIS r31,r31,r31, an instruction that bitwise-ORs the always-0 register with itself.
AMD 29k NOP40x70400101Opcode for aseq 0x40,gr1,gr1, an instruction that asserts that the stack register is equal to itself. [3]
ARM A32 NOP40x00000000This stands for andeq r0, r0, r0. The assembly instruction nop will most likely expand to mov r0, r0 which is encoded 0xE1A00000 (little-endian architecture). [4]
ARM T32 (16 bit)NOP20xb000Opcode for ADD SP, #0 - Add zero to the stack pointer (No operation). The assembly instruction nop will most likely expand to mov r8, r8 which is encoded 0x46C0. [5]
ARM T32 (32 bit)NOP40xF3AF 8000
ARM A64 (64 bit)NOP40xD503201F
AVR NOP20x0000one clock cycle
IBM System/360, IBM System/370, IBM System/390, z/Architecture, UNIVAC Series 90 NOP40x47000000 or 0x470nnnnn or 0x47n0nnnn where "n" is any 4-bit value.The NOP ("No-Op") and NOPR ("No-Op Register") are a subset of the "Branch on Condition" or "Branch on Condition Register" instructions, respectively; both versions have two options for generating a NO-OP.

In the case of both the NOP and NOPR instructions, the first 0 in the second byte is the "mask" value, the condition to test such as equal, not equal, high, low, etc. If the mask is 0, no branch occurs.

In the case of the NOPR instruction, the second value in the second byte is the register to branch on. If register 0 is chosen, no branch occurs regardless of the mask value. Thus, if either of the two values in the second byte is 0, the branch will not happen.

In the case of the NOP instruction, the second value in the second byte is the "base" register of a combined base register, displacement register and offset address. If the base register is also 0, the branch is not taken regardless of the value of the displacement register or displacement address.

NOPR20x0700 or 0x070n or 0x07n0 where "n" is any 4-bit value.
SuperH NOP20x0009
MIPS NOP40x00000000Stands for sll r0,r0,0, meaning: Logically shift register 0 zero bits to the left and store the result in register 0. Writes to register 0 are ignored; it always contains 0.
MIPS-X NOP40x60000019(extended opcode for add r0,r0,r0)
MIX NOP1 word± * * * * 0The * bytes are arbitrary, and can be anything from 0 to the maximum byte (required to be in the range 63-99). MIX uses sign-magnitude representation.
MMIX SWYM40xFD******SWYM stands for "Sympathize with your machinery". The * digits can be chosen arbitrarily.
Motorola 68000 family NOP20x4E71This synchronizes the pipeline and prevents instruction overlap. [1]
Motorola 6809 NOP10x12
MOS Technology 65xx (e.g. 6502)NOP10xEANOP consumes two clock cycles. Undefined opcodes in the NMOS versions of the 65xx family were converted to be NOPs of varying instruction lengths and cycle times in the 65C02.
PA-RISC NOP40x08000240Opcode for OR 0,0,0. [6]
LDI 26,040x34000034Palindromic NOP - that is, an instruction that executes as NOP regardless of whether byte order is interpreted as little-endian or big-endian. Some PA-RISC system instructions are required to be followed by seven palindromic NOPs. [6]
PowerPC NOP40x60000000(extended opcode for ori r0,r0,0)
PIC microcontroller NOP12 bits0b000000000000MOVW 0,W
RISC-V NOP40x00000013ADDI x0, x0, 0
C.NOP20x0001C.ADDI x0, 0. Only available on RISC-V CPUs that support the "C" (compressed instructions) extension. [7]
SPARC NOP40x01000000Stands for sethi 0, %g0 which zeroes the hardwired-to-zero %g0 register [8]
Z80 NOP10x00There are some other instructions without any effect (and the same timing): LD A, A, LD B, B etc.
PDP-10 JFCL 0, (conventional)
JUMP, SETA, SETAI, CAI, TRN, TLN
1 word25500******* (octal)Jump never
Jump never, set nothing, skip never
PDP-11 NOP16 bits000240 (octal)Clear none of the condition codes
VAX NOP10x01Delay is dependent on processor type

From a hardware design point of view, unmapped areas of a bus are often designed to return zeroes; since the NOP slide behavior is often desirable, it gives a bias to coding it with the all-zeroes opcode.

Code

A function or a sequence of programming language statements is a NOP or null statement if it has no effect. Null statements may be required by the syntax of some languages in certain contexts.

Ada

In Ada, the null statement serves as a NOP. [9] As the syntax forbids that control statements or functions be empty, the null statement must be used to specify that no action is required. (Thus, if the programmer forgets to write a sequence of statements, the program will fail to compile.)

C and derivatives

The simplest NOP statement in C is the null statement, which is just a semi-colon in a context requiring a statement.

Be aware that your C-compiler is going to ignore null statements, which has historical and performance reasons.

  ;

An empty block (compound statement) is also a NOP, and may be more legible, but will still be ignored by the compiler.:

  {}

In some cases, such as the body of a function, a block must be used, but this can be empty. In C, statements cannot be empty—simple statements must end with a ; (semicolon) while compound statements are enclosed in {} (braces), which does not itself need a following semicolon. Thus in contexts where a statement is grammatically required, some such null statement can be used.

The null statement is useless by itself, but it can have a syntactic use in a wider context, e.g., within the context of a loop:

while(getchar()!='\n'){}

alternatively,

while(getchar()!='\n');

or more tersely:

while(getchar()!='\n');

(note that the last form may be confusing, and as such generates a warning with some compilers or compiler options, as semicolon usually indicates an end of function call instruction when placed after a parenthesis on the end of line).

The above code continues calling the function getchar() until it returns a \n (newline) character, essentially fast-forwarding the current reading location of standard input to the beginning of next line.

Fortran

In Fortran, the CONTINUE statement is used in some contexts such as the last statement in a DO loop, although it can be used anywhere, and does not have any functionality.

JavaScript

The JavaScript language does not have a built-in NOP statement. Many implementations are possible:

Alternatives, in situations where a function is required, are:

constnoop=()=>{};

AngularJS

The AngularJS framework provides angular.noop function that performs no operations.

jQuery

The jQuery library provides a function jQuery.noop(), which does nothing. [12]

Lodash

The Lodash library provides a function _.noop(), which returns undefined and does nothing. [13]

Pascal

As with C, the ; used by itself can be used as a null statement in Pascal. In fact, due to the specification of the language, in a BEGIN / END block, the semicolon is optional before the END statement, thus a semicolon used there is superfluous.

Also, a block consisting of BEGIN END; may be used as a placeholder to indicate no action, even if placed inside another BEGIN / END block.

Python

The Python programming language has a pass statement which has no effect when executed and thus serves as a NOP. It is primarily used to ensure correct syntax due to Python's indentation-sensitive syntax; for example the syntax for definition of a class requires an indented block with the class logic, which has to be expressed as pass when it should be empty.

Shell scripting (bash, zsh, etc.)

The ':' [colon] command is a shell builtin that has similar effect to a "NOP" (a do-nothing operation). It is not technically an NOP, as it changes the special parameter $? (exit status of last command) to 0. It may be considered a synonym for the shell builtin 'true', and its exit status is true (0). [14] [15] [16]

TeX macro language (ConTeXt, LaTeX, etc.)

The TeX typographical system's macro language has the \relax command. [17] It does nothing by itself, but may be used to prevent the immediately preceding command from parsing any subsequent tokens. [18]

NOP protocol commands

Many computer protocols, such as telnet, include a NOP command that a client can issue to request a response from the server without requesting any other actions. Such a command can be used to ensure the connection is still alive or that the server is responsive. A NOOP command is part of the following protocols (this is a partial list):

Note that unlike the other protocols listed, the IMAP4 NOOP command has a specific purpose—it allows the server to send any pending notifications to the client.

While most telnet or FTP servers respond to a NOOP command with "OK" or "+OK", some programmers have added quirky responses to the client. For example, the ftpd daemon of MINIX responds to NOOP with the message: [19]

 200 NOOP to you too!

Cracking

NOPs are often involved when cracking software that checks for serial numbers, specific hardware or software requirements, presence or absence of hardware dongles, etc. in the form of a NOP slide. This process is accomplished by altering functions and subroutines to bypass security checks and instead simply return the expected value being checked for. Because most of the instructions in the security check routine will be unused, these would be replaced with NOPs, thus removing the software's security functionality without altering the positioning of everything which follows in the binary.

Security exploits

The NOP opcode can be used to form a NOP slide, which allows code to execute when the exact value of the instruction pointer is indeterminate (e.g., when a buffer overflow causes a function's return address on the stack to be overwritten).

See also

Related Research Articles

<span class="mw-page-title-main">Assembly language</span> Low-level programming language

In computer programming, assembly language, often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence between the instructions in the language and the architecture's machine code instructions. Assembly language usually has one statement per machine instruction (1:1), but constants, comments, assembler directives, symbolic labels of, e.g., memory locations, registers, and macros are generally also supported.

<span class="mw-page-title-main">Buffer overflow</span> Anomaly in computer security and programming

In programming and information security, a buffer overflow or buffer overrun is an anomaly whereby a program writes data to a buffer beyond the buffer's allocated memory, overwriting adjacent memory locations.

<span class="mw-page-title-main">Macro (computer science)</span> Rule for substituting a set input with a set output

In computer programming, a macro is a rule or pattern that specifies how a certain input should be mapped to a replacement output. Applying a macro to an input is known as macro expansion. The input and output may be a sequence of lexical tokens or characters, or a syntax tree. Character macros are supported in software applications to make it easy to invoke common command sequences. Token and tree macros are supported in some programming languages to enable code reuse or to extend the language, sometimes for domain-specific languages.

Pascal is an imperative and procedural programming language, designed by Niklaus Wirth as a small, efficient language intended to encourage good programming practices using structured programming and data structuring. It is named after French mathematician, philosopher and physicist Blaise Pascal.

In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation of that ISA.

<span class="mw-page-title-main">Interpreter (computing)</span> Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter's Virtual Machine.

In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures where each such procedure implements one of the nonterminals of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

Lexical tokenization is conversion of a text into meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is related to the type of tokenization used in Large language models (LLMs), but with two differences. First, lexical tokenization is usually based on a lexical grammar, whereas LLM tokenizers are usually probability-based. Second, LLM tokenizers perform a second step that converts the tokens into numerical values.

In computer science, self-modifying code is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance. The term is usually only applied to code where the self-modification is intentional, not in situations where code accidentally modifies itself due to an error such as a buffer overflow.

In computer programming, indentation style is a convention, a.k.a. style, governing the indentation of blocks of source code that is generally intended to convey structure.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In some programming languages, eval, short for the English evaluate, is a function which evaluates a string as though it were an expression in the language, and returns a result; in others, it executes multiple lines of code as though they had been included instead of the line including the eval. The input to eval is not necessarily a string; it may be structured representation of code, such as an abstract syntax tree, or of special type such as code. The analog for a statement is exec, which executes a string as if it were a statement; in some languages, such as Python, both are present, while in other languages only one of either eval or exec is.

IMP is an early systems programming language that was developed by Edgar T. Irons in the late 1960s through early 1970s, at the National Security Agency (NSA). Unlike most other systems languages, IMP supports syntax-extensible programming.

In computer programming, a return statement causes execution to leave the current subroutine and resume at the point in the code immediately after the instruction which called the subroutine, known as its return address. The return address is saved by the calling routine, today usually on the process's call stack or in a register. Return statements in many programming languages allow a function to specify a return value to be passed back to the code that called the function.

Transaction Application Language or TAL is a block-structured, procedural language optimized for use on Tandem hardware. TAL resembles a cross between C and Pascal. It was the original system programming language for the Tandem Computers CISC machines, which had no assembler.

Exception handling syntax is the set of keywords and/or structures provided by a computer programming language to allow exception handling, which separates the handling of errors that arise during a program's operation from its ordinary processes. Syntax for exception handling varies between programming languages, partly to cover semantic differences but largely to fit into each language's overall syntactic structure. Some languages do not call the relevant concept "exception handling"; others may not have direct facilities for it, but can still provide means to implement it.

This comparison of programming languages compares the features of language syntax (format) for over 50 computer programming languages.

<span class="mw-page-title-main">Rexx</span> Command/scripting/programming language

Rexx is a programming language that can be interpreted or compiled. It was developed at IBM by Mike Cowlishaw. It is a structured, high-level programming language designed for ease of learning and reading. Proprietary and open source Rexx interpreters exist for a wide range of computing platforms; compilers exist for IBM mainframe computers.

In computer programming, a function, subprogram, procedure, method, routine or subroutine is a callable unit that has a well-defined behavior and can be invoked by other software units to exhibit that behavior.

The PL/I preprocessor is the preprocessor for the PL/I computer programming language. The preprocessor interprets a subset of the full PL/I language to perform source file inclusion, conditional compilation, and macro expansion.

References

  1. 1 2 "Motorola 68000 Programmer's Reference Manual" (PDF).
  2. "Intel 64 and IA-32 Architectures Software Developer's Manual: Instruction Set Reference A-Z" (PDF). Retrieved 2012-03-01.
  3. AMD, Am29050 Microprocessor User's Manual, 1991, pages 223 and 257.
  4. "4.8.4. NOP ARM pseudo-instruction". RealView Compilation Tools for BREW Assembler Guide.
  5. "5.6.3. NOP Thumb pseudo-instruction". RealView Compilation Tools for BREW Assembler Guide.
  6. 1 2 Hewlett-Packard, PA-RISC 2.0 Architecture, 1995, pages 2-21 and 7-103. Archived on Jun 21, 2020.
  7. RISC-V Foundation, The RISC-V Instruction Set Manual, Volume 1: User-Level ISA, version 2.2, 7 May 2017, p.79.
  8. Weaver, D. L.; Germond, T., eds. (1994). The SPARC Architecture Manual, Version 9 (PDF). Prentice Hall. ISBN   0-13-825001-4. Archived from the original (PDF) on 2012-01-18. Retrieved 2014-01-09. Note that NOP is a special case of the SETHI instruction, with imm22 = 0 and rd = 0.
  9. Ada Reference Manual null statements. "The execution of a null_statement has no effect."
  10. MDN JavaScript reference – empty statement. "The empty statement is a semicolon (;) indicating that no statement will be executed, even if JavaScript syntax requires one."
  11. ECMAScript Language Specification – Edition 5.1 – Properties of the Function Prototype Object
  12. jQuery.noop() from jQuery API documentation
  13. "Lodash Documentation". lodash.com. Retrieved 2017-12-15.
  14. Advanced Bash-Scripting Guide > Chapter 3. Special Characters
  15. bash manpage > SHELL BUILTIN COMMANDS
  16. zsh manpage (zshbuiltins) > SHELL BUILTIN COMMANDS
  17. Bausum, David (2002). "TeX Primitive Control Sequences". TeX Reference Manual. Kluwer Academic Publishers. Retrieved 1 April 2020. According to The TeXbook, 'TeX does nothing' when it encounters \relax. Actually, \relax may tell TeX, 'This is the end of what you've been doing'.
  18. TeX wikibook – relax
  19. "ftpd.c" . Retrieved 2016-06-19.