Orthogonal instruction set

Last updated

In computer engineering, an orthogonal instruction set is an instruction set architecture where all instruction types can use all addressing modes. It is "orthogonal" in the sense that the instruction type and the addressing mode may vary independently. An orthogonal instruction set does not impose a limitation that requires a certain instruction to use a specific register [1] so there is little overlapping of instruction functionality. [2]

Contents

Orthogonality was considered a major goal for processor designers in the 1970s, and the VAX-11 is often used as the benchmark for this concept. However, the introduction of RISC design philosophies in the 1980s significantly reversed the trend against more orthogonality.

Modern CPUs often simulate orthogonality in a preprocessing step before performing the actual tasks in a RISC-like core. This "simulated orthogonality" in general is a broader concept, encompassing the notions of decoupling and completeness in function libraries, like in the mathematical concept: an orthogonal function set is easy to use as a basis into expanded functions, ensuring that parts don’t affect another if we change one part.

Basic concepts

At their core, all general purpose computers work in the same underlying fashion; data stored in a main memory is read by the central processing unit (CPU) into a fast temporary memory (e.g. CPU registers), acted on, and then written back to main memory. Memory consists of a collection of data values, encoded as numbers [lower-alpha 1] and referred to by their addresses, also a numerical value. This means the same operations applied to the data can be applied to the addresses themselves. [lower-alpha 2] While being worked on, data can be temporarily held in processor registers, scratchpad values that can be accessed very quickly. Registers are used, for example, when adding up strings of numbers into a total. [3]

Single instruction, single operand

In early computers, the instruction set architecture (ISA) often used a single register, in which case it was known as the accumulator. Instructions included an address for the operand. For instance, an ADD address instruction would cause the CPU to retrieve the number in memory found at that address and then add it to the value already in the accumulator. This very simple example ISA has a "one-address format" because each instruction includes the address of the data. [4]

One-address machines have the disadvantage that even simple actions like an addition require multiple instructions, each of which takes up scarce memory, [lower-alpha 3] and requires time to be read. Consider the simple task of adding two numbers, 5 + 4. In this case, the program would have to load the value 5 into the accumulator with the LOAD address instruction, use the ADD address instruction pointing to the address for the 4, and finally SAVE address to store the result, 9, back to another memory location. [4]

Single instruction, multiple operands

Further improvements can be found by providing the address of both of the operands in a single instruction, for instance, ADD address 1, address 2. Such "two-address format" ISAs are very common. One can further extend the concept to a "three-address format" where the SAVE is also folded into an expanded ADD address 1, address 2, address of result. [4]

It is often the case that the basic computer word is much larger than needed to hold just the instruction and an address, and in most systems, there are leftover bits that can be used to hold a constant instead of an address. Instructions can be further improved if they allow any one of the operands to be replaced by a constant. For instance, ADD address 1, constant 1 eliminates one memory cycle, and ADD constant 1, constant 2 another. [4]

Multiple data

Further complexity arises when one considers common patterns in which memory is accessed. One very common pattern is that a single operation may be applied across a large amount of similar data. For instance, one might want to add up 1,000 numbers. In a simple two-address format of instructions, [lower-alpha 4] there is no way to change the address, so 1,000 additions have to be written in the machine language. ISAs fix this problem with the concept of indirect addressing, in which the address of the next point of data is not a constant, but itself held in memory or a machine register. This means the programmer can change the address by performing addition on that memory location or register. ISAs also often include the ability to offset an address from an initial location, by adding a value held in one of its registers, in some cases a special index register. Others carry out this addition automatically as part of the instructions that use it. [4]

The variety of addressing modes leads to a profusion of slightly different instructions. Considering a one-address ISA, for even a single instruction, ADD, we now have many possible "addressing modes":

Many ISAs also have registers that can be used for addressing as well as math tasks. This can be used in a one-address format if a single address register is used. In this case, a number of new modes become available:

Orthogonality

Orthogonality is the principle that every instruction should be able to use any supported addressing mode. In this example, if the direct addressing version of ADD is available, all the others should be as well. The reason for this design is not aesthetic, the goal is to reduce the total size of a program's object code. By providing a variety of addressing modes, the ISA allows the programmer to choose the one that precisely matches the need of their program at that point, and thereby reduce the need to use multiple instructions to achieve the same end. This means the total number of instructions is reduced, both saving memory and improving performance. Orthogonality was often described as being highly "bit efficient". [5]

Keeping the addressing mode specifier bits separate from the opcode operation bits produces an orthogonal instruction set.

As the ultimate end of orthogonal design is simply to allow any instruction to use any type of address, implementing orthogonality is often simply a case of adding more wiring between the parts of the processor. However, it also adds to the complexity of the instruction decoder, the circuitry that reads an instruction from memory at the location pointed to by the program counter and then decides how to process it. [5]

In the example ISA outlined above, the ADD.C instruction using direct encoding already has the data it needs to run the instruction and no further processing is needed, the decoder simply sends the value into the arithmetic logic unit (ALU). However, if the ADD.A instruction is used, the address has to be read, the value at that memory location read, and then the ALU can continue. This series of events will take much longer to complete and requires more internal steps. [5]

As a result, the time needed to complete different variations of an instruction can vary widely, which adds complexity to the overall CPU design. Therefore, orthogonality represents a tradeoff in design; the computer designer can choose to offer more addressing modes to the programmer to improve code density at the cost of making the CPU itself more complex. [5]

When memory was small and expensive, especially during the era of drum memory or core memory, orthogonality was highly desirable. However, the complexity was often beyond what could be achieved using current technology. For this reason, most machines from the 1960s offered only partial orthogonality, as much as the designers could afford. It was in the 1970s that the introduction of large scale integration significantly reduced the complexity of computer designs and fully orthogonal designs began to emerge. By the 1980s, such designs could be implemented on a single-chip CPU. [5]

In the late 1970s, with the first high-powered fully orthogonal designs emerging, the goal widened to become the high-level language computer architecture, or HLLCA for short. Just as orthogonality was desired to improve the bit density of machine language, HLLCA's goal was to improve the bit density of high-level languages like ALGOL 68. These languages generally used an activation record, a type of complex stack that stored temporary values, which the ISAs generally did not directly support and had to be implemented using many individual instructions from the underlying ISA. Adding support for these structures would allow the program to be translated more directly into the ISA. [5]

Orthogonality in practice

The PDP-11

The PDP-11 was substantially orthogonal (primarily excepting its floating point instructions). [6] Most integer instructions could operate on either 1-byte or 2-byte values and could access data stored in registers, stored as part of the instruction, stored in memory, or stored in memory and pointed to by addresses in registers or memory. Even the PC and the stack pointer could be affected by the ordinary instructions using all of the ordinary data modes. "Immediate" mode (hardcoded numbers within an instruction, such as ADD #4, R1 (R1 = R1 + 4) was implemented as the mode "register indirect, autoincrement" and specifying the program counter (R7) as the register to use reference for indirection and to autoincrement. (Encoded as ADD (R7)+,R1 .word 4.) [7]

The PDP-11 used 3-bit fields for addressing modes (0-7) so there were (electronically) 8 addressing modes. An additional 3-bit field specified the registers (R0–R5, SP, PC). Immediate and absolute address operands applying the two autoincrement modes to the Program Counter (R7), provided a total of 10 conceptual addressing modes. Most two operand instructions supported all addressing modes for both parameters. [7]

The VAX-11

The VAX-11 extended the PDP-11's orthogonality to all data types, including floating point numbers. [5] Instructions such as 'ADD' were divided into data-size dependent variants such as ADDB, ADDW, ADDL, ADDP, ADDF for add byte, word, longword, packed BCD and single-precision floating point, respectively. Like the PDP-11, the Stack Pointer and Program Counter were in the general register file (R14 and R15). [8]

The general form of a VAX-11 instruction would be:

 opcode [ operand ] [ operand ]  ...

Each component being one byte, the opcode a value in the range 0–255, and each operand consisting of two nibbles, the upper 4 bits specifying an addressing mode, and the lower 4 bits (usually) specifying a register number (R0–R15). [8]

In contrast to the PDP-11's 3-bit fields, the VAX-11's 4-bit sub-bytes resulted in 16 addressing modes (0–15). However, addressing modes 0–3 were "short immediate" for immediate data of 6 bits or less (the 2 low-order bits of the addressing mode being the 2 high-order bits of the immediate data, when prepended to the remaining 4 bits in that data-addressing byte). Since addressing modes 0-3 were identical, this made 13 (electronic) addressing modes, but as in the PDP-11, the use of the Stack Pointer (R14) and Program Counter (R15) created a total of over 15 conceptual addressing modes (with the assembler program translating the source code into the actual stack-pointer or program-counter based addressing mode needed). [8]

The MC68000 and similar

Motorola's designers attempted to make the assembly language orthogonal while the underlying machine language was somewhat less so. Unlike PDP-11, the MC68000 (68k) used separate registers to store data and the addresses of data in memory. The ISA was orthogonal to the extent that addresses could only be used in those registers, but there was no restriction on which of the registers could be used by different instructions. Likewise, the data registers were also orthogonal across instructions. Unlike the PDP-11, the 68000 only supported one general addressing mode for two-parameter instructions. The other parameter was always a register, with the exception of MOV. The MOV instructions supported all addressing modes for both parameters. [9]

In contrast, the NS320xx series were originally designed as single-chip implementations of the VAX-11 ISA. Although this had to change due to legal issues, the resulting system retained much of the VAX-11's overall design philosophy and remained completely orthogonal. [10] This included the elimination of the separate data and address registers found in the 68k. [11]

The 8080 and follow on designs

The 8-bit Intel 8080 (as well as the 8085 and 8051) microprocessor was basically a slightly extended accumulator-based design and therefore not orthogonal. An assembly-language programmer or compiler writer had to be mindful of which operations were possible on each register: Most 8-bit operations could be performed only on the 8-bit accumulator (the A-register), while 16-bit operations could be performed only on the 16-bit pointer/accumulator (the HL-register pair), whereas simple operations, such as increment, were possible on all seven 8-bit registers. This was largely due to a desire to keep all opcodes one byte long.

The binary-compatible Z80 later added prefix-codes to escape from this 1-byte limit and allow for a more powerful instruction set. The same basic idea was employed for the Intel 8086, although, to allow for more radical extensions, binary-compatibility with the 8080 was not attempted here. It maintained some degree of non-orthogonality for the sake of high code density at the time. The 32-bit extension of this architecture that was introduced with the 80386, was somewhat more orthogonal despite keeping all the 8086 instructions and their extended counterparts. However, the encoding-strategy used still shows many traces from the 8008 and 8080 (and Z80). For instance, single-byte encodings remain for certain frequent operations such as push and pop of registers and constants; and the primary accumulator, the EAX register, employs shorter encodings than the other registers on certain types of operations. Observations like this are sometimes exploited for code optimization in both compilers and hand written code.

RISC

A number of studies through the 1970s demonstrated that the flexibility offered by orthogonal modes was rarely or never used in actual problems. In particular, an effort at IBM studied traces of code running on the System/370 and demonstrated that only a fraction of the available modes were being used in actual programs. Similar studies, often about the VAX, demonstrated the same pattern. In some cases, it was shown that the complexity of the instructions meant they took longer to perform than the sequence of smaller instructions, with the canonical example of this being the VAX's INDEX instruction. [12]

During this same period, semiconductor memories were rapidly increasing in size and decreasing in cost. However, they were not improving in speed at the same rate. This meant the time needed to access data from memory was growing in relative terms in comparison to the speed of the CPUs. This argued for the inclusion of more registers, giving the CPU more temporary values to work with. A larger number of registers meant more bits in the computer word would be needed to encode the register number, which suggested that the instructions themselves be reduced in number to free up room.

Finally, a paper by Andrew Tanenbaum demonstrated that 97% of all the constants in a program are between 0 and 10, with 0 representing between 20 and 30% of the total. Additionally, between 30 and 40% of all the values in a program are constants, with simple variables (as opposed to arrays or such) another 35 to 40%. [13] If the processor uses a larger instruction word, like 32-bits, two register numbers and a constant can be encoded in a single instruction as long as the instruction itself does not use too many bits.

These observations led to the abandonment of the orthogonal design as a primary goal of processor design, and the rise of the RISC philosophy in the 1980s. RISC processors generally have only two addressing modes, direct (constant) and register. All of the other modes found in older processors are handled explicitly using load and store instructions moving data to and from the registers. Only a few addressing modes may be available, and these modes may vary depending on whether the instruction refers to data or involves a transfer of control.

Notes

  1. See digitization.
  2. address are fact simple binary numbers which may be treated as data
  3. Even in modern computers, performance is maximized by keeping data in the cache, a limited resource.
  4. assuming that the address cannot be operated on

Related Research Articles

<span class="mw-page-title-main">Accumulator (computing)</span> Register in which intermediate arithmetic and logic results of a CPU are stored

In a computer's central processing unit (CPU), the accumulator is a register in which intermediate arithmetic logic unit results are stored.

A complex instruction set computer is a computer architecture in which single instructions can execute several low-level operations or are capable of multi-step operations or addressing modes within single instructions. The term was retroactively coined in contrast to reduced instruction set computer (RISC) and has therefore become something of an umbrella term for everything that is not RISC, where the typical differentiating characteristic is that most RISC designs use uniform instruction length for almost all instructions, and employ strictly separate load and store instructions.

<span class="mw-page-title-main">Data General Nova</span> 16-bit minicomputer series

The Data General Nova is a series of 16-bit minicomputers released by the American company Data General. The Nova family was very popular in the 1970s and ultimately sold tens of thousands of units.

<span class="mw-page-title-main">DEC Alpha</span> 64-bit RISC instruction set architecture

Alpha is a 64-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). Alpha was designed to replace 32-bit VAX complex instruction set computers (CISC) and to be a highly competitive RISC processor for Unix workstations and similar markets.

In processor design, microcode serves as an intermediary layer situated between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer, also known as its machine code. It consists of a set of hardware-level instructions that implement the higher-level machine code instructions or control internal finite-state machine sequencing in many digital processing components. While microcode is utilized in Intel and AMD general-purpose CPUs in contemporary desktops and laptops, it functions only as a fallback path for scenarios that the faster hardwired control unit is unable to manage.

MIPS is a family of reduced instruction set computer (RISC) instruction set architectures (ISA) developed by MIPS Computer Systems, now MIPS Technologies, based in the United States.

<span class="mw-page-title-main">Machine code</span> Lowest level instructions executed by a computer

In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binary representation of a computer program which is actually read and interpreted by the computer. A program in machine code consists of a sequence of machine instructions.

<span class="mw-page-title-main">Reduced instruction set computer</span> Processor executing one instruction in minimal clock cycles

In electronics and computer science, a reduced instruction set computer (RISC) is a computer architecture designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set computer (CISC), a RISC computer might require more instructions in order to accomplish a task because the individual instructions are written in simpler code. The goal is to offset the need to process more instructions by increasing the speed of each instruction, in particular by implementing an instruction pipeline, which may be simpler to achieve given simpler instructions.

The NS32000, sometimes known as the 32k, is a series of microprocessors produced by National Semiconductor. The first member of the family came to market in 1982, briefly known as the 16032 before becoming the 32016. It was the first general-purpose microprocessor on the market that used 32-bit data internally: the Motorola 68000 had 32-bit registers and instructions to perform 32-bit arithmetic, but used a 16-bit ALU for arithmetic operations on data, and thus took twice as long to perform those arithmetic operations. However, the 32016 contained many bugs and often could not be run at its rated speed. These problems, and the presence of the otherwise similar 68000 which had been available since 1980, led to little use in the market.

<span class="mw-page-title-main">Endianness</span> Order of bytes in a computer word

In computing, endianness is the order in which bytes within a word of digital data are transmitted over a data communication medium or addressed in computer memory, counting only byte significance compared to earliness. Endianness is primarily expressed as big-endian (BE) or little-endian (LE), terms introduced by Danny Cohen into computer science for data ordering in an Internet Experiment Note published in 1980. The adjective endian has its origin in the writings of 18th century Anglo-Irish writer Jonathan Swift. In the 1726 novel Gulliver's Travels, he portrays the conflict between sects of Lilliputians divided into those breaking the shell of a boiled egg from the big end or from the little end. By analogy, a CPU may read a digital word big end first, or little end first.

In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation of that ISA.

SuperH is a 32-bit reduced instruction set computing (RISC) instruction set architecture (ISA) developed by Hitachi and currently produced by Renesas. It is implemented by microcontrollers and microprocessors for embedded systems.

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.

<span class="mw-page-title-main">Index register</span> CPU register used for modifying operand addresses

An index register in a computer's CPU is a processor register used for pointing to operand addresses during the run of a program. It is useful for stepping through strings and arrays. It can also be used for holding loop iterations and counters. In some architectures it is used for read/writing blocks of memory. Depending on the architecture it may be a dedicated index register or a general-purpose register. Some instruction sets allow more than one index register to be used; in that case additional instruction fields may specify which index registers to use.

A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. In computer architecture, registers are typically addressed by mechanisms other than main memory, but may in some cases be assigned a memory address e.g. DEC PDP-10, ICT 1900.

Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions in that architecture identify the operand(s) of each instruction. An addressing mode specifies how to calculate the effective memory address of an operand by using information held in registers and/or constants contained within a machine instruction or elsewhere.

The PDP-11 architecture is a 16-bit CISC instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). It is implemented by central processing units (CPUs) and microprocessors used in PDP-11 minicomputers. It was in wide use during the 1970s, but was eventually overshadowed by the more powerful VAX architecture in the 1980s.

An instruction set architecture (ISA) is an abstract model of a computer, also referred to as computer architecture. A realization of an ISA is called an implementation. An ISA permits multiple implementations that may vary in performance, physical size, and monetary cost ; because the ISA serves as the interface between software and hardware. Software that has been written for an ISA can run on different implementations of the same ISA. This has enabled binary compatibility between different generations of computers to be easily achieved, and the development of computer families. Both of these developments have helped to lower the cost of computers and to increase their applicability. For these reasons, the ISA is one of the most important abstractions in computing today.

A compressed instruction set, or simply compressed instructions, are a variation on a microprocessor's instruction set architecture (ISA) that allows instructions to be represented in a more compact format. In most real-world examples, compressed instructions are 16 bits long in a processor that would otherwise use 32-bit instructions. The 16-bit ISA is a subset of the full 32-bit ISA, not a separate instruction set. The smaller format requires some tradeoffs: generally, there are fewer instructions available, and fewer processor registers can be used.

<span class="mw-page-title-main">WD16</span> Microprocessor produced by Western Digital

The WD16 is a 16-bit microprocessor introduced by Western Digital in October 1976. It is based on the MCP-1600 chipset, a general-purpose design that was also used to implement the DEC LSI-11 low-end minicomputer and the Pascal MicroEngine processor. The three systems differed primarily in their microcode, giving each system a unique instruction set architecture (ISA).

References

  1. Null, Linda; Lobur, Julia (2010). The Essentials of Computer Organization and Architecture. Jones & Bartlett Publishers. pp. 287–288. ISBN   978-1449600068.
  2. Tariq, Jamil (1995), "RISC vs CISC: Why less is more", IEEE Potentials (August/September), retrieved 7 May 2019
  3. "Basic Computer Organization & Design" (PDF). Computational Sensory-Motor Systems Laboratory.
  4. 1 2 3 4 5 Tullsen, Dean. "Instruction Set Architecture" (PDF). UCSD.
  5. 1 2 3 4 5 6 7 Hennessy, John; Patterson, David (2002-05-29). Computer Architecture: A Quantitative Approach. Elsevier. p. 151. ISBN   9780080502526.
  6. "Introduction to the PDP-11". University of Sydney.
  7. 1 2 "PDP-11 instruction reference" (PDF). University of Toronto.
  8. 1 2 3 "Another Approach to Instruction Set Architecture—VAX" (PDF).
  9. Veronis, Andrew (2012-12-06). The 68000 Microprocessor. Springer. p. 54. ISBN   9781468466478.
  10. Tilson, Michael (October 1983). "Moving Unix to New Machines". BYTE. p. 266. Retrieved 31 January 2015.
  11. "NS32532". Datormuseum.
  12. Patterson, D. A.; Ditzel, D. R. (1980). "The case for the reduced instruction set computer". ACM SIGARCH Computer Architecture News . 8 (6): 25–33. CiteSeerX   10.1.1.68.9623 . doi:10.1145/641914.641917. S2CID   12034303.
  13. Tanenbaum, Andrew (1978). "Implications of structured programming for machine architecture". Communications of the ACM. 21 (3): 237–246. doi: 10.1145/359361.359454 . hdl:1871/2610. S2CID   3261560.