Internet media type | application/java-vm, application/x-httpd-java, application/x-java, application/java, application/java-byte-code, application/x-java-class, application/x-java-vm |
---|---|
Developed by | Sun Microsystems |
A Java class file is a file (with the .class filename extension) containing Java bytecode that can be executed on the Java Virtual Machine (JVM). A Java class file is usually produced by a Java compiler from Java programming language source files (.java files) containing Java classes (alternatively, other JVM languages can also be used to create class files). If a source file has more than one class, each class is compiled into a separate class file. Thus, it is called a .class file because it contains the bytecode for a single class.
JVMs are available for many platforms, and a class file compiled on one platform will execute on a JVM of another platform. This makes Java applications platform-independent.
On 11 December 2006, the class file format was modified under Java Specification Request (JSR) 202. [1]
There are 10 basic sections to the Java class file structure:
Class files are identified by the following 4 byte header (in hexadecimal): CA FE BA BE
(the first 4 entries in the table below). The history of this magic number was explained by James Gosling referring to a restaurant in Palo Alto: [2]
"We used to go to lunch at a place called St Michael's Alley. According to local legend, in the deep dark past, the Grateful Dead used to perform there before they made it big. It was a pretty funky place that was definitely a Grateful Dead Kinda Place. When Jerry died, they even put up a little Buddhist-esque shrine. When we used to go there, we referred to the place as Cafe Dead. Somewhere along the line it was noticed that this was a HEX number. I was re-vamping some file format code and needed a couple of magic numbers: one for the persistent object file, and one for classes. I used CAFEDEAD for the object file format, and in grepping for 4 character hex words that fit after "CAFE" (it seemed to be a good theme) I hit on BABE and decided to use it. At that time, it didn't seem terribly important or destined to go anywhere but the trash-can of history. So CAFEBABE became the class file format, and CAFEDEAD was the persistent object format. But the persistent object facility went away, and along with it went the use of CAFEDEAD - it was eventually replaced by RMI."
Because the class file contains variable-sized items and does not also contain embedded file offsets (or pointers), it is typically parsed sequentially, from the first byte toward the end. At the lowest level the file format is described in terms of a few fundamental data types:
Some of these fundamental types are then re-interpreted as higher-level values (such as strings or floating-point numbers), depending on context. There is no enforcement of word alignment, and so no padding bytes are ever used. The overall layout of the class file is as shown in the following table.
Byte offset | Size | Type or value | Description |
---|---|---|---|
0 | 4 bytes | u1 = 0xCA hex | magic number (CAFEBABE) used to identify file as conforming to the class file format |
1 | u1 = 0xFE hex | ||
2 | u1 = 0xBA hex | ||
3 | u1 = 0xBE hex | ||
4 | 2 bytes | u2 | minor version number of the class file format being used |
5 | |||
6 | 2 bytes | u2 | major version number of the class file format being used. [3] Java SE 24 = 68 (0x44 hex), |
7 | |||
8 | 2 bytes | u2 | constant pool count, number of entries in the following constant pool table. This count is at least one greater than the actual number of entries; see following discussion. |
9 | |||
10 | cpsize (variable) | table | constant pool table, an array of variable-sized constant pool entries, containing items such as literal numbers, strings, and references to classes or methods. Indexed starting at 1, containing (constant pool count - 1) number of entries in total (see note). |
... | |||
... | |||
... | |||
10+cpsize | 2 bytes | u2 | access flags, a bitmask |
11+cpsize | |||
12+cpsize | 2 bytes | u2 | identifies this class, index into the constant pool to a "Class"-type entry |
13+cpsize | |||
14+cpsize | 2 bytes | u2 | identifies super class, index into the constant pool to a "Class"-type entry |
15+cpsize | |||
16+cpsize | 2 bytes | u2 | interface count, number of entries in the following interface table |
17+cpsize | |||
18+cpsize | isize (variable) | table | interface table: a variable-length array of constant pool indexes describing the interfaces implemented by this class |
... | |||
... | |||
... | |||
18+cpsize+isize | 2 bytes | u2 | field count, number of entries in the following field table |
19+cpsize+isize | |||
20+cpsize+isize | fsize (variable) | table | field table, variable length array of fields each element is a field_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.5 |
... | |||
... | |||
... | |||
20+cpsize+isize+fsize | 2 bytes | u2 | method count, number of entries in the following method table |
21+cpsize+isize+fsize | |||
22+cpsize+isize+fsize | msize (variable) | table | method table, variable length array of methods each element is a method_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.6 |
... | |||
... | |||
... | |||
22+cpsize+isize+fsize+msize | 2 bytes | u2 | attribute count, number of entries in the following attribute table |
23+cpsize+isize+fsize+msize | |||
24+cpsize+isize+fsize+msize | asize (variable) | table | attribute table, variable length array of attributes each element is an attribute_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7 |
... | |||
... | |||
... |
Since C doesn't support multiple variable length arrays within a struct, the code below won't compile and only serves as a demonstration.
structClass_File_Format{u4magic_number;u2minor_version;u2major_version;u2constant_pool_count;cp_infoconstant_pool[constant_pool_count-1];u2access_flags;u2this_class;u2super_class;u2interfaces_count;u2interfaces[interfaces_count];u2fields_count;field_infofields[fields_count];u2methods_count;method_infomethods[methods_count];u2attributes_count;attribute_infoattributes[attributes_count];}
The constant pool table is where most of the literal constant values are stored. This includes values such as numbers of all sorts, strings, identifier names, references to classes and methods, and type descriptors. All indexes, or references, to specific constants in the constant pool table are given by 16-bit (type u2) numbers, where index value 1 refers to the first constant in the table (index value 0 is invalid).
Due to historic choices made during the file format development, the number of constants in the constant pool table is not actually the same as the constant pool count which precedes the table. First, the table is indexed starting at 1 (rather than 0), but the count should actually be interpreted as the maximum index plus one. [6] Additionally, two types of constants (longs and doubles) take up two consecutive slots in the table, although the second such slot is a phantom index that is never directly used.
The type of each item (constant) in the constant pool is identified by an initial byte tag. The number of bytes following this tag and their interpretation are then dependent upon the tag value. The valid constant types and their tag values are:
Tag byte | Additional bytes | Description of constant | Version introduced |
---|---|---|---|
1 | 2+x bytes (variable) | UTF-8 (Unicode) string: a character string prefixed by a 16-bit number (type u2) indicating the number of bytes in the encoded string which immediately follows (which may be different than the number of characters). Note that the encoding used is not actually UTF-8, but involves a slight modification of the Unicode standard encoding form. | 1.0.2 |
3 | 4 bytes | Integer: a signed 32-bit two's complement number in big-endian format | 1.0.2 |
4 | 4 bytes | Float: a 32-bit single-precision IEEE 754 floating-point number | 1.0.2 |
5 | 8 bytes | Long: a signed 64-bit two's complement number in big-endian format (takes two slots in the constant pool table) | 1.0.2 |
6 | 8 bytes | Double: a 64-bit double-precision IEEE 754 floating-point number (takes two slots in the constant pool table) | 1.0.2 |
7 | 2 bytes | Class reference: an index within the constant pool to a UTF-8 string containing the fully qualified class name (in internal format) (big-endian) | 1.0.2 |
8 | 2 bytes | String reference: an index within the constant pool to a UTF-8 string (big-endian too) | 1.0.2 |
9 | 4 bytes | Field reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor. (big-endian) | 1.0.2 |
10 | 4 bytes | Method reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor. (big-endian) | 1.0.2 |
11 | 4 bytes | Interface method reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor. (big-endian) | 1.0.2 |
12 | 4 bytes | Name and type descriptor: two indexes to UTF-8 strings within the constant pool, the first representing a name (identifier) and the second a specially encoded type descriptor. | 1.0.2 |
15 | 3 bytes | Method handle: this structure is used to represent a method handle and consists of one byte of type descriptor, followed by an index within the constant pool. [6] | 7 |
16 | 2 bytes | Method type: this structure is used to represent a method type, and consists of an index within the constant pool. [6] | 7 |
17 | 4 bytes | Dynamic: this is used to specify a dynamically computed constant produced by invocation of a bootstrap method. [6] | 11 |
18 | 4 bytes | InvokeDynamic: this is used by an invokedynamic instruction to specify a bootstrap method, the dynamic invocation name, the argument and return types of the call, and optionally, a sequence of additional constants called static arguments to the bootstrap method. [6] | 7 |
19 | 2 bytes | Module: this is used to identify a module. [6] | 9 |
20 | 2 bytes | Package: this is used to identify a package exported or opened by a module. [6] | 9 |
There are only two integral constant types, integer and long. Other integral types appearing in the high-level language, such as boolean, byte, and short must be represented as an integer constant.
Class names in Java, when fully qualified, are traditionally dot-separated, such as "java.lang.Object". However within the low-level Class reference constants, an internal form appears which uses slashes instead, such as "java/lang/Object".
The Unicode strings, despite the moniker "UTF-8 string", are not actually encoded according to the Unicode standard, although it is similar. There are two differences (see UTF-8 for a complete discussion). The first is that the code point U+0000 is encoded as the two-byte sequence C0 80
(in hex) instead of the standard single-byte encoding 00
. The second difference is that supplementary characters (those outside the BMP at U+10000 and above) are encoded using a surrogate-pair construction similar to UTF-16 rather than being directly encoded using UTF-8. In this case each of the two surrogates is encoded separately in UTF-8. For example, U+1D11E is encoded as the 6-byte sequence ED A0 B4 ED B4 9E
, rather than the correct 4-byte UTF-8 encoding of F0 9D 84 9E
.
Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let programmers write once, run anywhere (WORA), meaning that compiled Java code can run on all platforms that support Java without the need to recompile. Java applications are typically compiled to bytecode that can run on any Java virtual machine (JVM) regardless of the underlying computer architecture. The syntax of Java is similar to C and C++, but has fewer low-level facilities than either of them. The Java runtime provides dynamic capabilities that are typically not available in traditional compiled languages.
A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes what is required in a JVM implementation. Having a specification ensures interoperability of Java programs across different implementations so that program authors using the Java Development Kit (JDK) need not worry about idiosyncrasies of the underlying hardware platform.
Java Platform, Standard Edition is a computing platform for development and deployment of portable code for desktop and server environments. Java SE was formerly known as Java 2 Platform, Standard Edition (J2SE).
Java and C++ are two prominent object-oriented programming languages. By many language popularity metrics, the two languages have dominated object-oriented and high-performance software development for much of the 21st century, and are often directly compared and contrasted. Java's syntax was based on C/C++.
In computing, the Java API for XML Processing (JAXP), one of the Java XML application programming interfaces (APIs), provides the capability of validating and parsing XML documents. It has three basic parsing interfaces:
In software design, the Java Native Interface (JNI) is a foreign function interface programming framework that enables Java code running in a Java virtual machine (JVM) to call and be called by native applications and libraries written in other languages such as C, C++ and assembly.
This article compares two programming languages: C# with Java. While the focus of this article is mainly the languages and their features, such a comparison will necessarily also consider some features of platforms and libraries. For a more detailed comparison of the platforms, see Comparison of the Java and .NET platforms.
The syntax of Java is the set of rules defining how a Java program is written and interpreted.
java.nio is a collection of Java programming language APIs that offer features for intensive I/O operations. It was introduced with the J2SE 1.4 release of Java by Sun Microsystems to complement an existing standard I/O. NIO was developed under the Java Community Process as JSR 51. An extension to NIO that offers a new file system API, called NIO.2, was released with Java SE 7 ("Dolphin").
In the Java computer programming language, an annotation is a form of syntactic metadata that can be added to Java source code. Classes, methods, variables, parameters and Java packages may be annotated. Like Javadoc tags, Java annotations can be read from source files. Unlike Javadoc tags, Java annotations can also be embedded in and read from Java class files generated by the Java compiler. This allows annotations to be retained by the Java virtual machine at run-time and read via reflection. It is possible to create meta-annotations out of the existing ones in Java.
Java is a set of computer software and specifications that provides a software platform for developing application software and deploying it in a cross-platform computing environment. Java is used in a wide variety of computing platforms from embedded devices and mobile phones to enterprise servers and supercomputers. Java applets, which are less common than standalone Java applications, were commonly run in secure, sandboxed environments to provide many features of native applications through being embedded in HTML pages.
.properties is a file extension for files mainly used in Java-related technologies to store the configurable parameters of an application. They can also be used for storing strings for Internationalization and localization; these are known as Property Resource Bundles.
The Java programming language and Java software platform have been criticized for design choices including the implementation of generics, forced object-oriented programming, the handling of unsigned numbers, the implementation of floating-point arithmetic, and a history of security vulnerabilities in the primary Java VM implementation, HotSpot. Software written in Java, especially its early versions, has been criticized for its performance compared to software written in other programming languages. Developers have also remarked that differences in various Java implementations must be taken into account when writing complex Java programs that must work with all of them.
In software development, the programming language Java was historically considered slower than the fastest third-generation typed languages such as C and C++. In contrast to those languages, Java compiles by default to a Java Virtual Machine (JVM) with operations distinct from those of the actual computer hardware. Early JVM implementations were interpreters; they simulated the virtual operations one-by-one rather than translating them into machine code for direct hardware execution.
Action Message Format (AMF) is a binary format used to serialize object graphs such as ActionScript objects and XML, or send messages between an Adobe Flash client and a remote service, usually a Flash Media Server or third party alternatives. The Actionscript 3 language provides classes for encoding and decoding from the AMF format.
InfinityDB is an all-Java embedded database engine and client/server DBMS with an extended java.util.concurrent.ConcurrentNavigableMap interface that is deployed in handheld devices, on servers, on workstations, and in distributed settings. The design is based on a proprietary lockless, concurrent, B-tree architecture that enables client programmers to reach high levels of performance without risk of failures.
This article compares the application programming interfaces (APIs) and virtual machines (VMs) of the programming language Java and operating system Android.
Java bytecode is the instruction set of the Java virtual machine (JVM), the language to which Java and other JVM-compatible source code is compiled. Each instruction is represented by a single byte, hence the name bytecode, making it a compact form of data.
Java code coverage tools are of two types: first, tools that add statements to the Java source code and require its recompilation. Second, tools that instrument the bytecode, either before or during execution. The goal is to find out which parts of the code are tested by registering the lines of code executed when running a test.
Concise Binary Object Representation (CBOR) is a binary data serialization format loosely based on JSON authored by Carsten Bormann and Paul Hoffman. Like JSON it allows the transmission of data objects that contain name–value pairs, but in a more concise manner. This increases processing and transfer speeds at the cost of human readability. It is defined in IETF RFC 8949.