Processor register

Last updated
A register-transfer level (RTL) description of an 8-bit register with detailed implementation, showing how 8 bits of data can be stored by using Flip-flops. Register.svg
A register-transfer level (RTL) description of an 8-bit register with detailed implementation, showing how 8 bits of data can be stored by using Flip-flops.

A processor register is a quickly accessible location available to a computer's processor. [1] Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. In computer architecture, registers are typically addressed by mechanisms other than main memory, but may in some cases be assigned a memory address e.g. DEC PDP-10, ICT 1900. [2]

Contents

Almost all computers, whether load/store architecture or not, load items of data from a larger memory into registers where they are used for arithmetic operations, bitwise operations, and other operations, and are manipulated or tested by machine instructions. Manipulated items are then often stored back to main memory, either by the same instruction or by a subsequent one. Modern processors use either static or dynamic RAM as main memory, with the latter usually accessed via one or more cache levels.

Processor registers are normally at the top of the memory hierarchy, and provide the fastest way to access data. The term normally refers only to the group of registers that are directly encoded as part of an instruction, as defined by the instruction set. However, modern high-performance CPUs often have duplicates of these "architectural registers" in order to improve performance via register renaming, allowing parallel and speculative execution. Modern x86 design acquired these techniques around 1995 with the releases of Pentium Pro, Cyrix 6x86, Nx586, and AMD K5.

When a computer program accesses the same data repeatedly, this is called locality of reference. Holding frequently used values in registers can be critical to a program's performance. Register allocation is performed either by a compiler in the code generation phase, or manually by an assembly language programmer.

Size

Registers are normally measured by the number of bits they can hold, for example, an "8-bit register", "32-bit register", "64-bit register", or even more. In some instruction sets, the registers can operate in various modes, breaking down their storage memory into smaller parts (32-bit into four 8-bit ones, for instance) to which multiple data (vector, or one-dimensional array of data) can be loaded and operated upon at the same time. Typically it is implemented by adding extra registers that map their memory into a larger register. Processors that have the ability to execute single instructions on multiple data are called vector processors.

Types

A processor often contains several kinds of registers, which can be classified according to the types of values they can store or the instructions that operate on them:

Hardware registers are similar, but occur outside CPUs.

In some architectures (such as SPARC and MIPS), the first or last register in the integer register file is a pseudo-register in that it is hardwired to always return zero when read (mostly to simplify indexing modes), and it cannot be overwritten. In Alpha, this is also done for the floating-point register file. As a result of this, register files are commonly quoted as having one register more than how many of them are actually usable; for example, 32 registers are quoted when only 31 of them fit within the above definition of a register.

Examples

The following table shows the number of registers in several mainstream CPU architectures. Note that in x86-compatible processors, the stack pointer (ESP) is counted as an integer register, even though there are a limited number of instructions that may be used to operate on its contents. Similar caveats apply to most architectures.

Although all of the below-listed architectures are different, almost all are in a basic arrangement known as the von Neumann architecture, first proposed by the Hungarian-American mathematician John von Neumann. It is also noteworthy that the number of registers on GPUs is much higher than that on CPUs.

ArchitectureGPRs/data+address registersFP registersNotes
AT&T Hobbit 0stack of 7All data manipulation instructions work solely within registers, and data must be moved into a register before processing.
Cray-1 [3] 8 scalar data, 8 address8 scalar, 8 vector

(64 elements)

Scalar data registers can be integer or floating-point; also 64 scalar scratch-pad T registers and 64 address scratch-pad B registers
4004 [4] 1 accumulator, 16 others0
8008 [5] 1 accumulator, 6 others0The A register is an accumulator to which all arithmetic is done; the H and L registers can be used in combination as an address register; all registers can be used as operands in load/store/move/increment/decrement instructions and as the other operand in arithmetic instructions. There is no floating-point unit (FPU) available.
8080 [6] 1 accumulator, 1 stack pointer, 6 others0The A register is an accumulator to which all arithmetic is done; the register pairs B+C, D+E, and H+L can be used as address registers in some instructions; all registers can be used as operands in load/store/move/increment/decrement instructions and as the other operand in arithmetic instructions. Some instructions only use H+L; another instruction swaps H+L and D+E. Floating-point processors intended for the 8080 were Intel 8231, AMD Am9511, and Intel 8232. They were also readily usable with the Z80 and similar processors.
iAPX432 0stack of 6Stack machine
16-bit x86 [7] 8stack of 8

(if FP present)

The 8086/8088, 80186/80188, and 80286 processors, if provided an 8087, 80187 or 80287 co-processor for floating-point operations, support an 80-bit wide, 8 deep register stack with some instructions able to use registers relative to the top of the stack as operands; without a co-processor, no floating-point registers are supported.
IA-32 [8] 8stack of 8 (if FP present),

8 (if SSE/MMX present)

The 80386 processor requires an 80387 for floating-point operations, later processors had built-in floating-point, with both having an 80-bit wide, 8 deep register stack with some instructions able to use registers relative to the top of the stack as operands. The Pentium III and later had the SSE with additional 128-bit XMM registers.
x86-64 [8] [9] 1616 or 32

(if AVX-512 available)

FP registers are 128-bit XMM registers, later extended to 256-bit YMM registers with AVX/AVX2 and 512-bit ZMM0–ZMM31 registers with AVX-512. [10]
Fairchild F8 [11] 1 accumulator, 64 scratchpad registers, 1 indirect scratchpad register (ISAR)Instructions can directly reference the first 16 scratchpad registers and can access all scratchpad registers indirectly through the ISAR [12]
Geode GX 1 data, 1 address8Geode GX/Media GX/4x86/5x86 is the emulation of 486/Pentium compatible processor made by Cyrix/National Semiconductor. Like Transmeta, the processor had a translation layer that translated x86 code to native code and executed it.[ citation needed ] It does not support 128-bit SSE registers, just the 80387 stack of eight 80-bit floating-point registers, and partially supports 3DNow! from AMD. The native processor only contains 1 data and 1 address register for all purposes and it is translated into 4 paths of 32-bit naming registers r1 (base), r2 (data), r3 (back pointer), and r4 (stack pointer) within scratchpad SRAM for integer operations.[ citation needed ]
Sunplus μ'nSP (SPG200) 8 (sp, r1-r4, bp, sr, pc)0A 16-bit processor from the Taiwanese company Sunplus Technology, which can be found on VTech's V.Smile line of educational video game consoles, as well as other consoles such as the Wireless 60 and various Jakks Pacific plug-in TV games.
VM Labs Nuon 01A 32-bit stack machine processor developed by VM Labs and specialized for multimedia. It can be found on the company's own Nuon DVD player console line and the Game Wave Family Entertainment System from ZaPit games. The design was heavily influenced by Intel's MMX technology; it contained a 128-byte unified stack cache for both vector and scalar instructions. The unified cache can be divided as eight 128-bit vector registers or thirty-two 32-bit SIMD scalar registers through bank renaming; there is no integer register in this architecture.
Nios II [13] [14] 318Nios II is based on the MIPS IV instruction set[ citation needed ] and has 31 32-bit GPRs, with register 0 being hardwired to zero, and eight 64-bit floating-point registers[ citation needed ]
Motorola 6800 [15] 2 data, 1 index, 1 stack0
Motorola 68k [16] 8 data (d0–d7), 8 address (a0–a7)8

(if FP present)

Address register 8 (a7) is the stack pointer. 68000, 68010, 68012, 68020, and 68030 require an FPU for floating-point; 68040 had FPU built in. FP registers are 80-bit.
SH 16-bit 166
Emotion Engine 3(VU0)+ 32(VU1)32 SIMD (integrated in UV1)

+ 2 × 32 Vector (dedicated vector co-processor located nearby its GPU)

The Emotion Engine's main core (VU0) is a heavily modified DSP general core intended for general background tasks and it contains one 64-bit accumulator, two general data registers, and one 32-bit program counter. A modified MIPS III executable core (VU1) is for game data and protocol control, and it contains thirty-two 32-bit general-purpose registers for integer computation and thirty-two 128-bit SIMD registers for storing SIMD instructions, streaming data value and some integer calculation value, and one accumulator register for connecting general floating-point computation to the vector register file on the co-processor. The coprocessor is built via a 32-entry 128-bit vector register file (can only store vector values that pass from the accumulator in the CPU) and no integer registers are built in. Both the vector co-processor (VPU 0/1) and the Emotion Engine's entire main processor module (VU0 + VU1 + VPU0 + VPU1) are built based on a modified MIPS instructions set. The accumulator in this case is not general-purpose but control status.
CUDA [17] configurable, up to 255 per threadEarlier generations allowed up to 127/63 registers per thread (Tesla/Fermi). The more registers are configured per thread, the fewer threads can run at the same time. Registers are 32 bits wide; double-precision floating-point numbers and 64-bit pointers therefore require two registers. It additionally has up to 8 predicate registers per thread. [18]
CDC 6000 series [19] 1688 'A' registers, A0–A7, hold 18-bit addresses; 8 'B' registers, B0–B7, hold 18-bit integer values (with B0 permanently set to zero); 8 'X' registers, X0–X7, hold 60 bits of integer or floating-point data. Seven of the eight 18-bit A registers were coupled to their corresponding X registers: setting any of the A1–A5 registers to a value caused a memory load of the contents of that address into the corresponding X register. Likewise, setting an address into registers A6 or A7 caused a memory store into that location in memory from X6 or X7. (Registers A0 and X0 were not coupled like this).
System/360, [20] System/370, [21] System/390, z/Architecture [22] 164 (if FP present);

16 in G5 and later S/390 models and z/Architecture

FP was optional in System/360, and always present in S/370 and later. In processors with the Vector Facility, there are 16 vector registers containing a machine-dependent number of 32-bit elements. [23] Some registers are assigned a fixed purpose by calling conventions; for example, register 14 is used for subroutine return addresses and, for ELF ABIs, register 15 is used as a stack pointer. The S/390 G5 processor increased the number of floating-point registers to 16. [24]
MMIX [25] 256256An instruction set designed by Donald Knuth in the late 1990s for pedagogical purposes.
NS320xx [26] 88

(if FP present)

Xelerated X10 132A 32/40-bit stack machine-based network processor with a modified MIPS instruction set and a 128-bit floating-point unit.[ citation needed ]
Parallax Propeller 02An eight-core 8/16-bit sliced stack machine controller with a simple logic circuit inside, it has 8 cog counters (cores), each containing three 8/16 bit special control registers with 32 bit x 512 stack RAM. However, it does not contain any general register for integer purposes. Unlike most shadow register files in modern processors and multi-core systems, all of the stack RAM in cog can be accessed in instruction level, which allows all of these cogs to act as a single general-purpose core if necessary. Floating-point unit is external and it contains two 80-bit vector registers.
Itanium [27] 128128And 64 1-bit predicate registers and 8 branch registers. The FP registers are 82-bit.
SPARC [28] 3132Global register 0 is hardwired to 0. Uses register windows.
IBM POWER 3232Also included are a link register, a count register, and a multiply quotient (MQ) register.
PowerPC/Power ISA [29] 3232Also included are a link register and a count register. Processors supporting the Vector facility also have 32 128-bit vector registers.
Blackfin [30] 8 data, 2 accumulator, 6 address0Also included are a stack pointer and a frame pointer. Additional registers are used to implement zero-overhead loops and circular buffer DAGs (data address generators).
IBM Cell SPE 128128 general purpose registers, which can hold integer, address, or floating-point values [31]
PDP-10 16All of the registers may be used generally (integer, float, stack pointer, jump, indexing, etc.). Every 36-bit memory (or register) word can also be manipulated as a half-word, which can be considered an (18-bit) address. Other word interpretations are used by certain instructions. In the original PDP-10 processors, these 16 GPRs also corresponded to main (i.e. core) memory locations 0–15; a hardware option called "fast memory" implemented the registers as separate ICs, and references to memory locations 0–15 referred to the IC registers. Later models implemented the registers as "fast memory" and continued to make memory locations 0–15 refer to them. Movement instructions take (register, memory) operands: MOVE 1,2 is register-register, and MOVE 1,1000 is memory-to-register.
PDP-11 76

(if FPP present)

R7 is the program counter. Any register can be a stack pointer but R6 is used for hardware interrupts and traps.
VAX [32] 16The general purpose registers are used for floating-point values as well. Three of the registers have special uses: R12 (Argument Pointer), R13 (Frame Pointer), and R14 (Stack Pointer), while R15 refers to the Program Counter.
Alpha [33] 3131Registers R31 (integer) and F31 (floating-point) are hardwired to zero.
6502 1 data, 2 index06502's content A (Accumulator) register for main purpose data store and memory address (8-bit data/16-bit address), X and Y are indirect and direct index registers (respectively) and the SP registers are specific index only.
W65C816S 1065c816 is the 16-bit successor of the 6502. X, Y, and D (Direct Page register) are condition registers and SP register are specific index only. Main accumulator extended to 16-bit (C) [34] while keeping 8-bit (A) for compatibility and main registers can now address up to 24-bit (16-bit wide data instruction/24-bit memory address).
MeP 48Media-embedded processor was a 32-bit processor developed by Toshiba with a modded 8080 instruction set. Only the A, B, C, and D registers are available through all modes (8/16/32-bit). It is incompatible with x86; however, it contains an 80-bit floating-point unit that is x87-compatible.
PIC microcontroller 10
AVR microcontroller 320
ARM 32-bit (ARM/A32, Thumb-2/T32)14Varies

(up to 32)

r15 is the program counter, and not usable as a general purpose register; r13 is the stack pointer; r8–r13 can be switched out for others (banked) on a processor mode switch. Older versions had 26-bit addressing, [35] and used upper bits of the program counter (r15) for status flags, making that register 32-bit.
ARM 32-bit (Thumb)816Version 1 of Thumb, which only supported access to registers r0 through r7 [36]
ARM 64-bit (A64) [37] 3132Register r31 is the stack pointer or hardwired to 0, depending on the context.
MIPS [38] 3132Integer register 0 is hardwired to 0.
RISC-V [39] 3132Integer register 0 is hardwired to 0. The RV32E variant, intended for systems with very limited resources, has 15 integer registers.
Epiphany 64 (per core) [40] Each instruction controls whether registers are interpreted as integers or single precision floating point. Architecture is scalable to 4096 cores with 16 and 64 core implementations currently available.

Usage

The number of registers available on a processor and the operations that can be performed using those registers has a significant impact on the efficiency of code generated by optimizing compilers. The Strahler number of an expression tree gives the minimum number of registers required to evaluate that expression tree.

See also

Related Research Articles

<span class="mw-page-title-main">Intel 8086</span> 16-bit microprocessor

The 8086 is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus, and is notable as the processor used in the original IBM PC design.

In processor design, microcode serves as an intermediary layer situated between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer, also known as its machine code. It consists of a set of hardware-level instructions that implement the higher-level machine code instructions or control internal finite-state machine sequencing in many digital processing components. While microcode is utilized in Intel and AMD general-purpose CPUs in contemporary desktops and laptops, it functions only as a fallback path for scenarios that the faster hardwired control unit is unable to manage.

<span class="mw-page-title-main">PDP-10</span> 36-bit computer by Digital (1966–1983)

Digital Equipment Corporation (DEC)'s PDP-10, later marketed as the DECsystem-10, is a mainframe computer family manufactured beginning in 1966 and discontinued in 1983. 1970s models and beyond were marketed under the DECsystem-10 name, especially as the TOPS-10 operating system became widely used.

x86 Family of instruction set architectures

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. The 8086 was introduced in 1978 as a fully 16-bit extension of 8-bit Intel's 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486. Colloquially, their names were "186", "286", "386" and "486".

<span class="mw-page-title-main">Endianness</span> Order of bytes in a computer word

In computing, endianness is the order in which bytes within a word of digital data are transmitted over a data communication medium or addressed in computer memory, counting only byte significance compared to earliness. Endianness is primarily expressed as big-endian (BE) or little-endian (LE), terms introduced by Danny Cohen into computer science for data ordering in an Internet Experiment Note published in 1980. The adjective endian has its origin in the writings of 18th century Anglo-Irish writer Jonathan Swift. In the 1726 novel Gulliver's Travels, he portrays the conflict between sects of Lilliputians divided into those breaking the shell of a boiled egg from the big end or from the little end. By analogy, a CPU may read a digital word big end first, or little end first.

<span class="mw-page-title-main">64-bit computing</span> Computer architecture bit width

In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit central processing units (CPU) and arithmetic logic units (ALU) are those that are based on processor registers, address buses, or data buses of that size. A computer that uses such a processor is a 64-bit computer.

The Intel x86 computer instruction set architecture has supported memory segmentation since the original Intel 8086 in 1978. It allows programs to address more than 64 KB (65,536 bytes) of memory, the limit in earlier 80xx processors. In 1982, the Intel 80286 added support for virtual memory and memory protection; the original mode was renamed real mode, and the new version was named protected mode. The x86-64 architecture, introduced in 2003, has largely dropped support for segmentation in 64-bit mode.

In computing, protected mode, also called protected virtual address mode, is an operational mode of x86-compatible central processing units (CPUs). It allows system software to use features such as segmentation, virtual memory, paging and safe multi-tasking designed to increase an operating system's control over application software.

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.

Memory-mapped I/O (MMIO) and port-mapped I/O (PMIO) are two complementary methods of performing input/output (I/O) between the central processing unit (CPU) and peripheral devices in a computer. An alternative approach is using dedicated I/O processors, commonly known as channels on mainframe computers, which execute their own instructions.

<span class="mw-page-title-main">Index register</span> CPU register used for modifying operand addresses

An index register in a computer's CPU is a processor register used for pointing to operand addresses during the run of a program. It is useful for stepping through strings and arrays. It can also be used for holding loop iterations and counters. In some architectures it is used for read/writing blocks of memory. Depending on the architecture it may be a dedicated index register or a general-purpose register. Some instruction sets allow more than one index register to be used; in that case additional instruction fields may specify which index registers to use.

<span class="mw-page-title-main">Memory address</span> Reference to a specific memory location

In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. Such numerical semantic bases itself upon features of CPU, as well upon use of the memory like an array endorsed by various programming languages.

Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions in that architecture identify the operand(s) of each instruction. An addressing mode specifies how to calculate the effective memory address of an operand by using information held in registers and/or constants contained within a machine instruction or elsewhere.

The TMS9900 was one of the first commercially available, single-chip 16-bit microprocessors. Introduced in June 1976, it implemented Texas Instruments' TI-990 minicomputer architecture in a single-chip format, and was initially used for low-end models of that lineup.

In computer engineering, an orthogonal instruction set is an instruction set architecture where all instruction types can use all addressing modes. It is "orthogonal" in the sense that the instruction type and the addressing mode may vary independently. An orthogonal instruction set does not impose a limitation that requires a certain instruction to use a specific register so there is little overlapping of instruction functionality.

Memory segmentation is an operating system memory management technique of dividing a computer's primary memory into segments or sections. In a computer system using segmentation, a reference to a memory location includes a value that identifies a segment and an offset within that segment. Segments or sections are also used in object files of compiled programs when they are linked together into a program image and when the image is loaded into memory.

A stack register is a computer central processor register whose purpose is to keep track of a call stack. On an accumulator-based architecture machine, this may be a dedicated register. On a machine with multiple general-purpose registers, it may be a register that is reserved by convention, such as on the IBM System/360 through z/Architecture architecture and RISC architectures, or it may be a register that procedure call and return instructions are hardwired to use, such as on the PDP-11, VAX, and Intel x86 architectures. Some designs such as the Data General Eclipse had no dedicated register, but used a reserved hardware memory address for this function.

<span class="mw-page-title-main">History of general-purpose CPUs</span>

The history of general-purpose CPUs is a continuation of the earlier history of computing hardware.

In computer architecture, 16-bit integers, memory addresses, or other data units are those that are 16 bits wide. Also, 16-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers, address buses, or data buses of that size. 16-bit microcomputers are microcomputers that use 16-bit microprocessors.

<span class="mw-page-title-main">WD16</span> Microprocessor produced by Western Digital

The WD16 is a 16-bit microprocessor introduced by Western Digital in October 1976. It is based on the MCP-1600 chipset, a general-purpose design that was also used to implement the DEC LSI-11 low-end minicomputer and the Pascal MicroEngine processor. The three systems differed primarily in their microcode, giving each system a unique instruction set architecture (ISA).

References

  1. "What is a processor register?". Educative: Interactive Courses for Software Developers. Retrieved 2022-08-12.
  2. "A Survey of Techniques for Designing and Managing CPU Register File".
  3. "Cray-1 Computer System Hardware Reference Manual" (PDF). Cray Research. November 1977. Archived (PDF) from the original on 2021-11-07. Retrieved 2022-12-23.
  4. "MCS-4 Micro Computer Set Users Manual" (PDF). Intel. February 1973. Archived (PDF) from the original on 2005-02-24.
  5. "8008 8 Bit Parallel Central Processor Unit Users Manual" (PDF). Intel. November 1973. Archived (PDF) from the original on 2007-10-04. Retrieved January 23, 2014.
  6. "Intel 8080 Microcomputer Systems User's Manual" (PDF). Intel. September 1975. Archived (PDF) from the original on 2010-12-06. Retrieved January 23, 2014.
  7. "80286 and 80287 Programmer's Reference Manual" (PDF). Intel. 1987. Archived (PDF) from the original on 2015-07-23.
  8. 1 2 "Intel 64 and IA-32 Architectures Software Developer Manuals". Intel. 4 December 2019.
  9. "AMD64 Architecture Programmer's Manual Volume 1: Application Programming" (PDF). AMD. October 2013.
  10. "Intel Architecture Instruction Set Extensions and Future Features Programming Reference" (PDF). Intel. January 2018.
  11. F8, Preliminary Microprocessor User's Manual (PDF). Fairchild. January 1975.
  12. F8 Guide to Programming (PDF). Fairchild MOS Microcomputer Division. 1977.
  13. "Nios II Classic Processor Reference Guide" (PDF). Altera. April 2, 2015.
  14. "Nios II Gen2 Processor Reference Guide" (PDF). Altera. April 2, 2015.
  15. "M6800 Programming Reference Manual" (PDF). Motorola. November 1976. Archived (PDF) from the original on 2011-10-14. Retrieved May 18, 2015.
  16. "Motorola M68000 Family Programmer's Reference Manual" (PDF). Motorola. 1992. Retrieved November 10, 2024.
  17. "CUDA C Programming Guide". Nvidia. 2019. Retrieved Jan 9, 2020.
  18. Jia, Zhe; Maggioni, Marco; Staiger, Benjamin; Scarpazza, Daniele P. (2018). "Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking". arXiv: 1804.06826 [cs.DC].
  19. Control Data 6000 Series Computer Systems, Reference Manual (PDF). Control Data Corporation. July 1965.
  20. IBM System/360 Principles of Operation (PDF). IBM.
  21. IBM System/370, Principles of Operation (PDF). IBM. September 1, 1975.
  22. z/Architecture, Principles of Operation (PDF) (Seventh ed.). IBM. 2008.
  23. "IBM Enterprise Systems Architecture/370 and System/370 - Vector Operations" (PDF). IBM. SA22-7125-3. Retrieved May 11, 2020.
  24. "IBM S/390 G5 Microprocessor" (PDF).
  25. "MMIX Home Page".
  26. "Series 32000 Databook" (PDF). National Semiconductor. Archived (PDF) from the original on 2017-11-25.
  27. Intel Itanium Architecture, Software Developer's Manual, Volume 3: Intel Itanium Instruction Set Reference (PDF). Intel. May 2010.
  28. Weaver, David L.; Germond, Tom (eds.). The SPARC Architecture Manual, Version 9 (PDF). Santa Clara, California: SPARC International, Inc.
  29. Power ISA Version 3.1B (PDF). OpenPOWER Foundation. September 14, 2021.
  30. Blackfin Processor, Programming Reference, Revision 2.2 (PDF). Analog Devices. February 2013.
  31. "Synergistic Processor Unit Instruction Set Architecture Version 1.2" (PDF). IBM. January 27, 2007.
  32. Leonard, Timothy E., ed. (1987). VAX Architecture, Reference Manual (PDF). DEC books.
  33. Alpha Architecture Reference Manual (PDF) (Fourth ed.). Compaq Computer Corporation. January 2002.
  34. "Learning 65816 Assembly". Super Famicom Development Wiki. Retrieved 14 November 2019.
  35. "Procedure Call Standard for the ARM Architecture" (PDF). ARM Holdings. 30 November 2013. Retrieved 27 May 2013.
  36. "2.6.2. The Thumb-state register set". ARM7TDMI Technical Reference Manual. ARM Holdings.
  37. Arm A64 Instruction Set Architecture, Armv8, for Armv8-A architecture profile (PDF). Arm. 2021.
  38. MIPS64 Architecture For Programmers, Volume II: The MIPS64 Instruction Set (PDF). RISC-V Foundation. March 12, 2001. Retrieved October 6, 2024.
  39. Waterman, Andrew; Asanovi, Krste, eds. (May 2017). The RISC-V, Instruction Set Manual, Volume I: User-Level ISA, Document Version 2.2 (PDF). RISC-V Foundation.
  40. "Epiphany Architecture Reference" (PDF).