MAJC

Last updated
MAJC
Designer Sun Microsystems
Introduced1990s
Design VLIW

MAJC (Microprocessor Architecture for Java Computing) was a Sun Microsystems multi-core, multithreaded, very long instruction word (VLIW) microprocessor design from the mid-to-late 1990s. Originally called the UltraJava processor, the MAJC processor was targeted at running Java programs, whose "late compiling" allowed Sun to make several favourable design decisions. The processor was released into two commercial graphical cards from Sun. Lessons learned regarding multi-threads on a multi-core processor provided a basis for later OpenSPARC implementations such as the UltraSPARC T1.

Contents

Design elements

Move instruction scheduling to the compiler

Like other VLIW designs, notably Intel's IA-64 (Itanium), MAJC attempted to improve performance by moving several expensive operations out of the processor and into the related compilers. In general, VLIW designs attempt to eliminate the instruction scheduler, which often represents a relatively large amount of the overall processor's transistor budget. With this portion of the CPU removed to software, those transistors can be used for other purposes, often to add additional functional units to process more instructions at once, or to increase the amount of cache memory to reduce the amount of time spent waiting for data to arrive from the much slower main memory. Although MAJC shared these general concepts, it was unlike other VLIW designs, and processors in general, in a number of specific details.

Generalized functional units

Most processors include a number of separate "subprocessors" known as functional units that are tuned to operating on a particular type of data. For instance, a modern CPU typically has two or three functional units dedicated to processing integer data and logic instructions, known as ALUs, while other units handle floating-point numbers, the FPUs, or multimedia data, SIMD. MAJC instead used a single multi-purpose functional unit which could process any sort of data. In theory this approach meant that processing any one type of data would take longer, perhaps much longer, than processing the same data in a unit dedicated to that type of data. But on the other hand, these general-purpose units also meant that you did not end up with large portions of the CPU being unused because the program just happened to be doing many (for example) floating point calculations at that particular point in time.

Variable-length instruction packets

Another difference is that MAJC allowed for variable-length "instruction packets", which under VLIW contain a number of instructions that the compiler has determined can be run at the same time. Most VLIW architectures use fixed-length packets and when they cannot find an instruction to run they instead fill it with a NOP, which simply takes up space. Although variable-length instruction packets added some complexity to the CPU, it reduced code size and thus the number of expensive cache misses by increasing the amount of code in the cache at any one time.

Avoiding interlocks and stalls

The primary difference was the way that the MAJC design required the compiler to avoid interlocks, pauses in execution while the results of one instruction need to be processed for the next to be able to run. For instance, if the processor is fed the instructions C = A + B, E = C + D, then the second instruction can be run only after the first completes. Most processors include locks in the design to stall out and reschedule these sorts of interlocked instructions, allowing some other instructions to run while the value of C is being calculated. However these interlocks are very expensive in terms of chip real-estate, and represents the majority of the instruction scheduler's logic.

In order for the compiler to avoid these interlocks, it would have to know exactly how long each of these instructions would take to complete. For instance, if a particular implementation took three cycles to complete a floating-point multiplication, MAJC compilers would attempt to schedule in other instructions that took three cycles to complete and were not currently stalled. A change in the actual implementation might reduce this delay to only two instructions, however, and the compiler would need to be aware of this change.

This means that the compiler was not tied to MAJC as a whole, but a particular implementation of MAJC, each individual CPU based on the MAJC design. This would normally be a serious logistical problem; consider the number of different variations of the Intel IA-32 design for instance, each one would need its own dedicated compiler and the developer would have to produce a different binary for every one. However it is precisely this concept that drives the Java marketthere is indeed a different compiler for each ISA, and it is installed on the client's machine instead of the developer's. The developer ships only a single bytecode version of their program, and the user's machine compiles that to the underlying platform.

In reality, scheduling instructions in this fashion turns out to be a very difficult problem. In real-world use, processors that attempt to do this scheduling at runtime encounter numerous events when the data needed is outside the cache, and there is no other instruction in the program that isn't also dependent on such data. In these cases the processor might stall for long periods, waiting on main memory. The VLIW approach does not help much in this regard; although the compiler might be able to spend more time looking for instructions to run, that doesn't mean that it can actually find one.

MAJC attempted to address this problem through the ability to execute code from other threads if the current thread stalled on memory. Switching threads is normally a very expensive process known as a context switch, and on a normal processor the switch would overwhelm any savings and generally slow the machine down. On MAJC, the system could hold the state for up to four threads in memory at the same time, reducing the context switch to a few instructions in length. This feature has since appeared on other processors; Intel refers to it as HyperThreading.

MAJC took this idea one step further, and tried to prefetch data and instructions needed for threads while they were stalled. Most processors include similar functionality for parts of an instruction stream, known as speculative execution, where the processor runs both of the possible outcomes of a branch while waiting for the deciding variable to calculate. MAJC instead continued to run the thread as if it were not stalled, using this execution to find and then load any data or instructions that would soon be needed when the thread stopped stalling. Sun referred to this as Space-Time Computing (STC), and it is a speculative multithreading design.

Processors up to this point had tried to extract parallelism in a single thread, a technique that was reaching its limits in terms of diminishing returns. In seems that in a general sense the MAJC design attempted to avoid stalls by running across threads (and programs) as opposed to looking for parallelism in a single thread. VLIW is generally expected to be somewhat worse in terms of stalls because it is difficult to understand runtime behavior at compile-time, making the MAJC approach in dealing with this problem particularly interesting.

Implementations

Sun built a single model of the MAJC, the two-core MAJC 5200, which was the heart of Sun's XVR-1000 and XVR-4000 workstation graphics boards. However many of the multicore and multithreading design ideas, notably in terms of using multiple threads to reduce stalling delays, worked their way into the Sun SPARC processor line, as well as designs from other companies. Additionally, the MAJC idea of designing the processor to run as many threads as possible, as opposed to instructions, appears to be the basis of the later UltraSPARC T1 (code-named Niagara) design.

See also

Further reading

Related Research Articles

<span class="mw-page-title-main">Central processing unit</span> Central computer component which executes instructions

A central processing unit (CPU) is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

<span class="mw-page-title-main">Superscalar processor</span> CPU that implements instruction-level parallelism within a single processor

A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows more throughput than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor, but an execution resource within a single CPU such as an arithmetic logic unit.

Very long instruction word (VLIW) refers to instruction set architectures that are designed to exploit instruction level parallelism (ILP). A VLIW processor allows programs to explicitly specify instructions to execute in parallel, whereas conventional central processing units (CPUs) mostly allow programs to specify instructions to execute in sequence only. VLIW is intended to allow higher performance without the complexity inherent in some other designs.

The 88000 is a RISC instruction set architecture developed by Motorola during the 1980s. The MC88100 arrived on the market in 1988, some two years after the competing SPARC and MIPS. Due to the late start and extensive delays releasing the second-generation MC88110, the m88k achieved very limited success outside of the MVME platform and embedded controller environments. When Motorola joined the AIM alliance in 1991 to develop the PowerPC, further development of the 88000 ended.

<span class="mw-page-title-main">Single instruction, multiple data</span> Type of parallel processing

Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

IA-64 is the instruction set architecture (ISA) of the discontinued Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by Intel in collaboration with HP. The first Itanium processor, codenamed Merced, was released in 2001.

The Intel i860 is a RISC microprocessor design introduced by Intel in 1989. It is one of Intel's first attempts at an entirely new, high-end instruction set architecture since the failed Intel iAPX 432 from the beginning of the 1980s. It was the world's first million-transistor chip. It was released with considerable fanfare, slightly obscuring the earlier Intel i960, which was successful in some niches of embedded systems. The i860 never achieved commercial success and the project was terminated in the mid-1990s.

<span class="mw-page-title-main">Instruction-level parallelism</span> Ability of computer instructions to be executed simultaneously with correct results

Instruction-level parallelism (ILP) is the parallel or simultaneous execution of a sequence of instructions in a computer program. More specifically ILP refers to the average number of instructions run per step of this parallel execution.

Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern processor architectures.

Explicitly parallel instruction computing (EPIC) is a term coined in 1997 by the HP–Intel alliance to describe a computing paradigm that researchers had been investigating since the early 1980s. This paradigm is also called Independence architectures. It was the basis for Intel and HP development of the Intel Itanium architecture, and HP later asserted that "EPIC" was merely an old term for the Itanium architecture. EPIC permits microprocessors to execute software instructions in parallel by using the compiler, rather than complex on-die circuitry, to control parallel instruction execution. This was intended to allow simple performance scaling without resorting to higher clock frequencies.

In computer engineering, out-of-order execution is a paradigm used in high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a processor executes instructions in an order governed by the availability of input data and execution units, rather than by their original order in a program. In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete and can, in the meantime, process the next instructions that are able to run immediately and independently.

<span class="mw-page-title-main">Microarchitecture</span> Component of computer engineering

In electronics, computer science and computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

<span class="mw-page-title-main">UltraSPARC T1</span> Microprocessor by Sun Microsystems

Sun Microsystems' UltraSPARC T1 microprocessor, known until its 14 November 2005 announcement by its development codename "Niagara", is a multithreading, multicore CPU. Designed to lower the energy consumption of server computers, the CPU typically uses 72 W of power at 1.4 GHz.

<span class="mw-page-title-main">Multi-core processor</span> Microprocessor with more than one processing unit

A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core.

<span class="mw-page-title-main">UltraSPARC T2</span> Microprocessor by Sun Microsystems

Sun Microsystems' UltraSPARC T2 microprocessor is a multithreading, multi-core CPU. It is a member of the SPARC family, and the successor to the UltraSPARC T1. The chip is sometimes referred to by its codename, Niagara 2. Sun started selling servers with the T2 processor in October 2007.

<span class="mw-page-title-main">History of general-purpose CPUs</span>

The history of general-purpose CPUs is a continuation of the earlier history of computing hardware.

<span class="mw-page-title-main">Multithreading (computer architecture)</span> Ability of a CPU to provide multiple threads of execution concurrently

In computer architecture, multithreading is the ability of a central processing unit (CPU) to provide multiple threads of execution concurrently, supported by the operating system. This approach differs from multiprocessing. In a multithreaded application, the threads share the resources of a single or multiple cores, which include the computing units, the CPU caches, and the translation lookaside buffer (TLB).

Explicit data graph execution, or EDGE, is a type of instruction set architecture (ISA) which intends to improve computing performance compared to common processors like the Intel x86 line. EDGE combines many individual instructions into a larger group known as a "hyperblock". Hyperblocks are designed to be able to easily run in parallel.

<span class="mw-page-title-main">TRIPS architecture</span>

TRIPS was a microprocessor architecture designed by a team at the University of Texas at Austin in conjunction with IBM, Intel, and Sun Microsystems. TRIPS uses an instruction set architecture designed to be easily broken down into large groups of instructions (Graphs) that can be run on independent processing elements. The design collects related data into the graphs, attempting to avoid expensive data reads and writes and keeping the data in high speed memory close to the processing elements. The prototype TRIPS processor contains 16 such elements. TRIPS hoped to reach 1 TFLOP on a single processor, as papers were published from 2003 to 2006.

The SPARC64 V (Zeus) is a SPARC V9 microprocessor designed by Fujitsu. The SPARC64 V was the basis for a series of successive processors designed for servers, and later, supercomputers.