Java bytecode

Last updated

Java bytecode is the instruction set of the Java virtual machine (JVM), crucial for executing programs written in the Java language and other JVM-compatible languages. [1] Each bytecode operation in the JVM is represented by a single byte, hence the name "bytecode", making it a compact form of instruction. [2] This intermediate form enables Java programs to be platform-independent, as they are compiled not to native machine code but to a universally executable format across different JVM implementations.

Contents

The JVM interprets this bytecode or compiles it on-the-fly into native machine code using a Just-In-Time (JIT) compiler, enhancing the performance of Java applications. The design of Java bytecode aims for a high degree of cross-platform compatibility and security, executed within the JVM's controlled environment. This architecture allows Java applications to run consistently across various hardware and software configurations. [3] While Java programmers typically do not interact directly with bytecode, understanding its structure and execution can be advantageous for optimization and debugging purposes.

In the JVM, Java bytecode operates as a set of instructions for both a stack machine and a register machine, utilizing an operand stack and local variables for executing operations. [2] The bytecode comprises various instruction types, including data manipulation, control transfer, object creation and manipulation, and method invocation, all integral to Java's object-oriented programming model. [1]

Relation to Java

A Java programmer does not need to be aware of or understand Java bytecode at all. However, as suggested in the IBM developerWorks journal, "Understanding bytecode and what bytecode is likely to be generated by a Java compiler helps the Java programmer in the same way that knowledge of assembly helps the C or C++ programmer." [4]

Instruction set architecture

The JVM is both a stack machine and a register machine. Each frame for a method call has an "operand stack" and an array of "local variables". [5] :2.6 The operand stack is used for operands to computations and for receiving the return value of a called method, while local variables serve the same purpose as registers and are also used to pass method arguments. The maximum size of the operand stack and local variable array, computed by the compiler, is part of the attributes of each method. [5] :4.7.3 Each can be independently sized from 0 to 65535 values, where each value is 32 bits. long and double types, which are 64 bits, take up two consecutive local variables [5] :2.6.1 (which need not be 64-bit aligned in the local variables array) or one value in the operand stack (but are counted as two units in the depth of the stack). [5] :2.6.2

Instruction set

Each bytecode is composed of one byte that represents the opcode, along with zero or more bytes for operands. [5] :2.11

Of the 256 possible byte-long opcodes, as of 2015, 202 are in use (~79%), 51 are reserved for future use (~20%), and 3 instructions (~1%) are permanently reserved for JVM implementations to use. [5] :6.2 Two of these (impdep1 and impdep2) are to provide traps for implementation-specific software and hardware, respectively. The third is used for debuggers to implement breakpoints.

Instructions fall into a number of broad groups:

There are also a few instructions for a number of more specialized tasks such as exception throwing, synchronization, etc.

Many instructions have prefixes and/or suffixes referring to the types of operands they operate on. [5] :2.11.1 These are as follows:

Prefix/suffixOperand type
iinteger
llong
sshort
bbyte
ccharacter
ffloat
ddouble
areference

For example, iadd will add two integers, while dadd will add two doubles. The const, load, and store instructions may also take a suffix of the form _n, where n is a number from 0–3 for load and store. The maximum n for const differs by type.

The const instructions push a value of the specified type onto the stack. For example, iconst_5 will push an integer (32 bit value) with the value 5 onto the stack, while dconst_1 will push a double (64 bit floating point value) with the value 1 onto the stack. There is also an aconst_null, which pushes a null reference. The n for the load and store instructions specifies the index in the local variable array to load from or store to. The aload_0 instruction pushes the object in local variable 0 onto the stack (this is usually the this object). istore_1 stores the integer on the top of the stack into local variable 1. For local variables beyond 3 the suffix is dropped and operands must be used.

Example

Consider the following Java code:

outer:for(inti=2;i<1000;i++){for(intj=2;j<i;j++){if(i%j==0)continueouter;}System.out.println(i);}

A Java compiler might translate the Java code above into bytecode as follows, assuming the above was put in a method:

0:iconst_21:istore_12:iload_13:sipush10006:if_icmpge449:iconst_210:istore_211:iload_212:iload_113:if_icmpge3116:iload_117:iload_218:irem19:ifne2522:goto3825:iinc2,128:goto1131:getstatic#84;//Fieldjava/lang/System.out:Ljava/io/PrintStream;34:iload_135:invokevirtual#85; // Method java/io/PrintStream.println:(I)V38:iinc1,141:goto244:return

Generation

The most common language targeting Java virtual machine by producing Java bytecode is Java. Originally only one compiler existed, the javac compiler from Sun Microsystems, which compiles Java source code to Java bytecode; but because all the specifications for Java bytecode are now available, other parties have supplied compilers that produce Java bytecode. Examples of other compilers include:

Some projects provide Java assemblers to enable writing Java bytecode by hand. Assembly code may be also generated by machine, for example by a compiler targeting a Java virtual machine. Notable Java assemblers include:

Others have developed compilers, for different programming languages, to target the Java virtual machine, such as:

Execution

There are several Java virtual machines available today to execute Java bytecode, both free and commercial products. If executing bytecode in a virtual machine is undesirable, a developer can also compile Java source code or bytecode directly to native machine code with tools such as the GNU Compiler for Java (GCJ). Some processors can execute Java bytecode natively. Such processors are termed Java processors .

Support for dynamic languages

The Java virtual machine provides some support for dynamically typed languages. Most of the extant JVM instruction set is statically typed - in the sense that method calls have their signatures type-checked at compile time, without a mechanism to defer this decision to run time, or to choose the method dispatch by an alternative approach. [12]

JSR 292 (Supporting Dynamically Typed Languages on the Java Platform) [13] added a new invokedynamic instruction at the JVM level, to allow method invocation relying on dynamic type checking (instead of the extant statically type-checked invokevirtual instruction). The Da Vinci Machine is a prototype virtual machine implementation that hosts JVM extensions aimed at supporting dynamic languages. All JVMs supporting JSE 7 also include the invokedynamic opcode.

See also

Related Research Articles

<span class="mw-page-title-main">Java (programming language)</span> Object-oriented programming language

Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let programmers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need to recompile. Java applications are typically compiled to bytecode that can run on any Java virtual machine (JVM) regardless of the underlying computer architecture. The syntax of Java is similar to C and C++, but has fewer low-level facilities than either of them. The Java runtime provides dynamic capabilities that are typically not available in traditional compiled languages.

<span class="mw-page-title-main">Java virtual machine</span> Virtual machine that runs Java programs

A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes what is required in a JVM implementation. Having a specification ensures interoperability of Java programs across different implementations so that program authors using the Java Development Kit (JDK) need not worry about idiosyncrasies of the underlying hardware platform.

In computer programming, a p-code machine is a virtual machine designed to execute p-code. This term is applied both generically to all such machines, and to specific implementations, the most famous being the p-Machine of the Pascal-P system, particularly the UCSD Pascal implementation, among whose developers, the p in p-code was construed to mean pseudo more often than portable, thus pseudo-code meaning instructions for a pseudo-machine.

Java and C++ are two prominent object-oriented programming languages. By many language popularity metrics, the two languages have dominated object-oriented and high-performance software development for much of the 21st century, and are often directly compared and contrasted. Java's syntax was based on C/C++.

Common Intermediate Language (CIL), formerly called Microsoft Intermediate Language (MSIL) or Intermediate Language (IL), is the intermediate language binary instruction set defined within the Common Language Infrastructure (CLI) specification. CIL instructions are executed by a CIL-compatible runtime environment such as the Common Language Runtime. Languages which target the CLI compile to CIL. CIL is object-oriented, stack-based bytecode. Runtimes typically just-in-time compile CIL instructions into native code.

In computer science, threaded code is a programming technique where the code has a form that essentially consists entirely of calls to subroutines. It is often used in compilers, which may generate code in that form or be implemented in that form themselves. The code may be processed by an interpreter or it may simply be a sequence of machine code call instructions.

<span class="mw-page-title-main">Interpreter (computing)</span> Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter Virtual Machine.

Bytecode is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references that encode the result of compiler parsing and performing semantic analysis of things like type, scope, and nesting depths of program objects.

In computing, just-in-time (JIT) compilation is compilation during execution of a program rather than before execution. This may consist of source code translation but is more commonly bytecode translation to machine code, which is then executed directly. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation or recompilation would outweigh the overhead of compiling that code.

In computing, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Beside the opcode itself, most instructions also specify the data they will process, in the form of operands. In addition to opcodes used in the instruction set architectures of various CPUs, which are hardware devices, they can also be used in abstract computing machines as part of their byte code specifications.

In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down stack. In the case of a hardware processor, a hardware stack is used. The use of a stack significantly reduces the required number of processor registers. Stack machines extend push-down automata with additional load/store operations or multiple stacks and hence are Turing-complete.

This article compares two programming languages: C# with Java. While the focus of this article is mainly the languages and their features, such a comparison will necessarily also consider some features of platforms and libraries. For a more detailed comparison of the platforms, see Comparison of the Java and .NET platforms.

A Java class file is a file containing Java bytecode that can be executed on the Java Virtual Machine (JVM). A Java class file is usually produced by a Java compiler from Java programming language source files containing Java classes. If a source file has more than one class, each class is compiled into a separate class file.

IJVM is an instruction set architecture created by Andrew Tanenbaum for his MIC-1 architecture. It is used to teach assembly basics in his book Structured Computer Organization.

<span class="mw-page-title-main">Java (software platform)</span> Set of computer software and specifications

Java is a set of computer software and specifications that provides a software platform for developing application software and deploying it in a cross-platform computing environment. Java is used in a wide variety of computing platforms from embedded devices and mobile phones to enterprise servers and supercomputers. Java applets, which are less common than standalone Java applications, were commonly run in secure, sandboxed environments to provide many features of native applications through being embedded in HTML pages.

In software development, the programming language Java was historically considered slower than the fastest third-generation typed languages such as C and C++. In contrast to those languages, Java compiles by default to a Java Virtual Machine (JVM) with operations distinct from those of the actual computer hardware. Early JVM implementations were interpreters; they simulated the virtual operations one-by-one rather than translating them into machine code for direct hardware execution.

Dalvik is a discontinued process virtual machine (VM) in the Android operating system that executes applications written for Android. Dalvik was an integral part of the Android software stack in the Android versions 4.4 "KitKat" and earlier, which were commonly used on mobile devices such as mobile phones and tablet computers, and more in some devices such as smart TVs and wearables. Dalvik is open-source software, originally written by Dan Bornstein, who named it after the fishing village of Dalvík in Eyjafjörður, Iceland.

<span class="mw-page-title-main">Da Vinci Machine</span> Sun Microsystems project

The Da Vinci Machine, also called the Multi Language Virtual Machine, was a Sun Microsystems project aiming to prototype the extension of the Java Virtual Machine (JVM) to add support for dynamic languages.

This article compares the application programming interfaces (APIs) and virtual machines (VMs) of the programming language Java and operating system Android.

Toi is an imperative, type-sensitive language that provides the basic functionality of a programming language. The language was designed and developed from the ground-up by Paul Longtine. Written in C, Toi was created with the intent to be an educational experience and serves as a learning tool for those looking to familiarize themselves with the inner-workings of a programming language.

References

  1. 1 2 "Java Virtual Machine Specification". Oracle. Retrieved 14 November 2023.
  2. 1 2 Lindholm, Tim (2015). The Java Virtual Machine Specification. Oracle. ISBN   978-0133905908.
  3. Arnold, Ken (1996). "The Java Programming Language". Sun Microsystems. 1 (1): 30–40.
  4. "IBM Developer". developer.ibm.com. Retrieved 20 February 2006.
  5. 1 2 3 4 5 6 7 Lindholm, Tim; Yellin, Frank; Bracha, Gilad; Buckley, Alex (13 February 2015). The Java Virtual Machine Specification (Java SE 8 ed.).
  6. Jasmin home page
  7. Jamaica: The Java virtual machine (JVM) macro assembler
  8. Krakatau home page
  9. Lilac home page
  10. Free Pascal 3.0 release notes
  11. Free Pascal JVM Target
  12. Nutter, Charles (3 January 2007). "InvokeDynamic: Actually Useful?" . Retrieved 25 January 2008.
  13. see JSR 292