Jazelle

Last updated

Jazelle DBX (direct bytecode execution) [1] is an extension that allows some ARM processors to execute Java bytecode in hardware as a third execution state alongside the existing ARM and Thumb modes. [2] Jazelle functionality was specified in the ARMv5TEJ architecture [3] and the first processor with Jazelle technology was the ARM926EJ-S. [4] Jazelle is denoted by a "J" appended to the CPU name, except for post-v5 cores where it is required (albeit only in trivial form) for architecture conformance.

Contents

Jazelle RCT (Runtime Compilation Target) is a different technology based on ThumbEE mode; it supports ahead-of-time (AOT) and just-in-time (JIT) compilation with Java and other execution environments.

The most prominent use of Jazelle DBX is by manufacturers of mobile phones to increase the execution speed of Java ME games and applications.[ citation needed ] A Jazelle-aware Java virtual machine (JVM) will attempt to run Java bytecode in hardware, while returning to the software for more complicated, or lesser-used bytecode operations. ARM claims that approximately 95% of bytecode in typical program usage ends up being directly processed in the hardware.

The published specifications are very incomplete, being only sufficient for writing operating system code that can support a JVM that uses Jazelle.[ citation needed ] The declared intent is that only the JVM software needs to (or is allowed to) depend on the hardware interface details. This tight binding facilitates the hardware and JVM evolving together without affecting other software. In effect, this gives ARM Holdings considerable control over which JVMs are able to exploit Jazelle.[ citation needed ] It also prevents open source JVMs from using Jazelle. These issues do not apply to the ARMv7 ThumbEE environment, the nominal successor to Jazelle DBX.

Implementation

The Jazelle extension uses low-level binary translation, implemented as an extra stage between the fetch and decode stages in the processor instruction pipeline. Recognised bytecodes are converted into a string of one or more native ARM instructions.

The Jazelle mode moves JVM interpretation into hardware for the most common simple JVM instructions. This is intended to significantly reduce the cost of interpretation. Among other things, this reduces the need for Just-in-time compilation and other JVM accelerating techniques. [5] JVM instructions that are not implemented in Jazelle hardware cause appropriate routines in the Jazelle-aware JVM implementation to be invoked. Details are not published, since all JVM innards are transparent (except for performance) if correctly interpreted.

Jazelle mode is entered via the BXJ instructions. A hardware implementation of Jazelle will only cover a subset of JVM bytecodes. For unhandled bytecodes—or if overridden by the operating system—the hardware will invoke the software JVM. The system is designed so that the software JVM does not need to know which bytecodes are implemented in hardware and a software fallback is provided by the software JVM for the full set of bytecodes.

Instruction set

The Jazelle instruction set is well documented as Java bytecode. However, ARM has not released details on the exact execution environment details; the documentation provided with Sun's HotSpot Java Virtual Machine goes as far as to state: "For the avoidance of doubt, distribution of products containing software code to exercise the BXJ instruction and enable the use of the ARM Jazelle architecture extension without [..] agreement from ARM is expressly forbidden." [6]

Employees of ARM have in the past published several white papers that do give some good pointers about the processor extension.[ citation needed ] Versions of the ARM Architecture reference Manual available from 2008 have included pseudocode for the "BXJ" (Branch and eXchange to Java) instruction, but with the finer details being shown as "SUB-ARCHITECTURE DEFINED" and documented elsewhere.

Application binary interface (ABI)

The Jazelle state relies on an agreed calling convention between the JVM and the Jazelle hardware state. This application binary interface is not published by ARM, rendering Jazelle an undocumented feature for most users and Free Software JVMs.

The entire VM state is held within normal ARM registers, allowing compatibility with existing operating systems and interrupt handlers unmodified. Restarting a bytecode (such as following a return from interrupt) will re-execute the complete sequence of related ARM instructions.

Specific registers are designated to hold the most important parts of the JVM state: registers R0–R3 hold an alias of the top of the Java stack, R4 holds Java local operand zero (pointer to *this) and R6 contains the Java stack pointer. [7]

Jazelle reuses the existing program counter PC or its synonym register R15. A pointer to the next bytecode goes in R14, [8] so the use of the PC is not generally user-visible except during debugging.

CPSR: Mode indication

Java bytecode is indicated as the current instruction set by a combination of two bits in the ARM CPSR (Current Program Status Register). The "T"-bit must be cleared and the "J"-bit set. [9]

Bytecodes are decoded by the hardware in two stages (versus a single stage for Thumb and ARM code) and switching between hardware and software decoding (Jazelle mode and ARM mode) takes ~4 clock cycles. [10]

For entry to Jazelle hardware state to succeed, the JE (Jazelle Enable) [3] bit in the CP14:C0(C2)[bit 0] register must be set; clearing of the JE bit by a [privileged] operating system provides a high-level override to prevent application programs from using the hardware Jazelle acceleration. [11] Additionally, the CV (Configuration Valid) bit [3] found in CP14:c0(c1)[bit 1] [11] must be set to show that there is a consistent Jazelle state setup for the hardware to use.

BXJ: Branch to Java

The BXJ instruction attempts to switch to Jazelle state, and if allowed and successful, sets the "J" bit in the CPSR; otherwise, it "falls through" and acts as a standard BX (Branch) instruction. [3] The only time when an operating system or debugger must be fully aware of the Jazelle mode is when decoding a faulted or trapped instruction. The Java program counter (PC) pointing to the next instructions must be placed in the Link Register (R14) before executing the BXJ branch request, as regardless of hardware or software processing, the system must know where to begin decoding.

Because the current state is held in the CPSR, the bytecode instruction set is automatically reselected after task-switching and processing of the current Java bytecode is restarted. [7]

Following an entry into the Jazelle state mode, bytecodes can be processed in one of three ways: decoded and executed natively in hardware, handled in software (with optimised ARM/ThumbEE JVM code), or treated as an invalid/illegal opcode. The third case will cause a branch to an ARM exception mode, as will a Java bytecode of 0xff, which is used for setting JVM breakpoints. [12]

Execution will continue in hardware until an unhandled bytecode is encountered, or an exception occurs. Between 134 and 149 bytecodes (out of 203 bytecodes specified in the JVM specification) are translated and executed directly in the hardware.

Low-level registers

Low-level configuration registers, for the hardware virtual machine, are held in the ARM Co-processor "CP14 register c0". The registers allow detecting, enabling or disabling the hardware accelerator (if it is available). [13]

A "trivial" hardware implementation of Jazelle (as found in the QEMU emulator) is only required to support the BXJ opcode itself (treating BXJ as a normal BX instruction [3] ) and to return RAZ (Read-As-Zero) for all of the CP14:c0 Jazelle-related registers. [14]

Successor: ThumbEE

The ARMv7 architecture has de-emphasized Jazelle and Direct Bytecode Execution of JVM bytecodes. In implementation terms, only trivial hardware support for Jazelle is now required: support for entering and exiting Jazelle mode, but not for executing any Java bytecodes.

Instead, the Thumb Execution Environment (ThumbEE) was to be preferred, but has since also been deprecated. Support for ThumbEE was mandatory in ARMv7-A processors (such as the Cortex-A8 and Cortex-A9), and optional in ARMv7-R processors. ThumbEE targeted compiled environments, perhaps using JIT technologies. It was not at all specific to Java, and was fully documented; much broader adoption was anticipated than Jazelle was able to achieve.

ThumbEE was a variant of the Thumb2 16/32-bit instruction set. It integrated null pointer checking; defined some new fault mechanisms; and repurposed the 16-bit LDM and STM opcode space to support a few instructions such as range checking, a new handler invocation scheme, and more. Accordingly, compilers that produced Thumb or Thumb2 code could be modified to work with ThumbEE-based runtime environments.

Related Research Articles

<span class="mw-page-title-main">Java virtual machine</span> Java Virtual machine

A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes what is required in a JVM implementation. Having a specification ensures interoperability of Java programs across different implementations so that program authors using the Java Development Kit (JDK) need not worry about idiosyncrasies of the underlying hardware platform.

x86 Family of instruction set architectures

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors. Colloquially, their names were "186", "286", "386" and "486".

In computer science, threaded code is a programming technique where the code has a form that essentially consists entirely of calls to subroutines. It is often used in compilers, which may generate code in that form or be implemented in that form themselves. The code may be processed by an interpreter or it may simply be a sequence of machine code call instructions.

In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation.

<span class="mw-page-title-main">Interpreter (computing)</span> Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter Virtual Machine.

ARM is a family of RISC instruction set architectures (ISAs) for computer processors. Arm Ltd. develops the ISAs and licenses them to other companies, who build the physical devices that use the instruction set. It also designs and licenses cores that implement these ISAs.

SuperH is a 32-bit reduced instruction set computing (RISC) instruction set architecture (ISA) developed by Hitachi and currently produced by Renesas. It is implemented by microcontrollers and microprocessors for embedded systems.

In computing, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Beside the opcode itself, most instructions also specify the data they will process, in the form of operands. In addition to opcodes used in the instruction set architectures of various CPUs, which are hardware devices, they can also be used in abstract computing machines as part of their byte code specifications.

Fetching the instruction opcodes from program memory well in advance is known as prefetching and it is served by using a prefetch input queue (PIQ). The pre-fetched instructions are stored in a queue. The fetching of opcodes well in advance, prior to their need for execution, increases the overall efficiency of the processor boosting its speed. The processor no longer has to wait for the memory access operations for the subsequent instruction opcode to complete. This architecture was prominently used in the Intel 8086 microprocessor.

In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down stack. In the case of a hardware processor, a hardware stack is used. The use of a stack significantly reduces the required number of processor registers. Stack machines extend push-down automata with additional load/store operations or multiple stacks and hence are Turing-complete.

ARM9 is a group of 32-bit RISC ARM processor cores licensed by ARM Holdings for microcontroller use. The ARM9 core family consists of ARM9TDMI, ARM940T, ARM9E-S, ARM966E-S, ARM920T, ARM922T, ARM946E-S, ARM9EJ-S, ARM926EJ-S, ARM968E-S, ARM996HS. Since ARM9 cores were released from 1998 to 2006, they are no longer recommended for new IC designs, instead ARM Cortex-A, ARM Cortex-M, ARM Cortex-R cores are preferred.

A Java processor is the implementation of the Java virtual machine (JVM) in hardware. In other words, the Java bytecode that makes up the instruction set of the abstract machine becomes the instruction set of a concrete machine. These were the most popular form of a high-level language computer architecture, and were "an attractive choice for building embedded and real-time systems that are programmed in Java". However, as of 2017, embedded Java is "pretty much dead" and no realtime Java chip vendors exist.

The 65xx family of microprocessors, consisting of the MOS Technology 6502 and its derivatives, the WDC 65C02, WDC 65C802 and WDC 65C816, and CSG 65CE02, all handle interrupts in a similar fashion. There are three hardware interrupt signals common to all 65xx processors and one software interrupt, the BRK instruction. The WDC 65C816 adds a fourth hardware interrupt—ABORT, useful for implementing virtual memory architectures—and the COP software interrupt instruction, intended for use in a system with a coprocessor of some type.

In software development, the programming language Java was historically considered slower than the fastest 3rd generation typed languages such as C and C++. The main reason being a different language design, where after compiling, Java programs run on a Java virtual machine (JVM) rather than directly on the computer's processor as native code, as do C and C++ programs. Performance was a matter of concern because much business software has been written in Java after the language quickly became popular in the late 1990s and early 2000s.

<span class="mw-page-title-main">ARM Cortex-A8</span>

The ARM Cortex-A8 is a 32-bit processor core licensed by ARM Holdings implementing the ARMv7-A architecture.

<span class="mw-page-title-main">ARM Cortex-M</span> Group of 32-bit RISC processor cores

The ARM Cortex-M is a group of 32-bit RISC ARM processor cores licensed by ARM Limited. These cores are optimized for low-cost and energy-efficient integrated circuits, which have been embedded in tens of billions of consumer devices. Though they are most often the main component of microcontroller chips, sometimes they are embedded inside other types of chips too. The Cortex-M family consists of Cortex-M0, Cortex-M0+, Cortex-M1, Cortex-M3, Cortex-M4, Cortex-M7, Cortex-M23, Cortex-M33, Cortex-M35P, Cortex-M55, Cortex-M85. A floating-point unit (FPU) option is available for Cortex-M4 / M7 / M33 / M35P / M55 / M85 cores, and when included in the silicon these cores are sometimes known as "Cortex-MxF", where 'x' is the core variant.

The ARM Cortex-A is a group of 32-bit and 64-bit RISC ARM processor cores licensed by Arm Holdings. The cores are intended for application use. The group consists of 32-bit only cores: ARM Cortex-A5, ARM Cortex-A7, ARM Cortex-A8, ARM Cortex-A9, ARM Cortex-A12, ARM Cortex-A15, ARM Cortex-A17 MPCore, and ARM Cortex-A32, 32/64-bit mixed operation cores: ARM Cortex-A35, ARM Cortex-A53, ARM Cortex-A55, ARM Cortex-A57, ARM Cortex-A72, ARM Cortex-A73, ARM Cortex-A75, ARM Cortex-A76, ARM Cortex-A77, ARM Cortex-A78, ARM Cortex-A710, and ARM Cortex-A510 Refresh, and 64-bit only cores: ARM Cortex-A34, ARM Cortex-A65, ARM Cortex-A510 (2021), ARM Cortex-A715, ARM Cortex-A520, and ARM Cortex-A720.

In computing, Java bytecode is the bytecode-structured instruction set of the Java virtual machine (JVM), a virtual machine that enables a computer to run programs written in the Java programming language and several other programming languages, see List of JVM languages.

<span class="mw-page-title-main">AArch64</span> 64-bit extension of the ARM architecture

AArch64 or ARM64 is the 64-bit extension of the ARM architecture family.

References

  1. US 7089539,"Program instruction interpretation"
  2. "Artificial Intelligence Enhanced Computing". Archived from the original on 2014-03-28.
  3. 1 2 3 4 5 "ARM Architecture reference Manual" (PDF). arm.com. Archived from the original on 2007-01-26.{{cite web}}: CS1 maint: unfit URL (link)
  4. "Shanghai Jade Licenses ARM Prime Starter Kit for DCP SoC". Design And Reuse. 2004-01-12. Archived from the original on 2004-02-06.
  5. "CPM Design Online - Using ARM DBX hardware extensions to accelerate Java in space-constrained embedded apps". Archived from the original on 2008-12-21. Retrieved 2009-02-25.
  6. "Release Notes - CLDC HotSpotTM Implementation - Version 1.1.3". java.sun.com. 2007-10-26. Archived from the original on 2008-06-02.{{cite web}}: CS1 maint: unfit URL (link)
  7. 1 2 "ARM Whitepaper, Accelerating to meet the challenge of embedded Java" (PDF). jp.arm.com. 2004-04-14. Archived from the original (PDF) on 2009-01-09.
  8. Intel, ARM Architecture introduction. Dead link, February 2020 [ permanent dead link ]
  9. Marinas, Catalin (4 June 2007). "Re: [RFC][PATCH] Add ARM Jazelle state info in show_regs tombstone". linux-arm-kernel (Mailing list). Retrieved 5 June 2020.
  10. ARM Whitepaper, High performance Java on embedded devices
  11. 1 2 "ARM アーキテクチャ リファレンスマニュアル" [ARM Architecture Reference Manual](PDF). jp.arm.com (in Japanese). 2008-09-10. Archived from the original (PDF) on 2008-09-10.
  12. ARM, ARM1026EJ-S Technical reference Manual
  13. ARM reference Manual, Understanding ARM11 Processor Power Saving Modes
  14. ARM reference, Cortex-A8 Technical reference Manual