Just-in-time compilation

Last updated

In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) [1] is compilation (of computer code) during execution of a program (at run time) rather than before execution. [2] This may consist of source code translation but is more commonly bytecode translation to machine code, which is then executed directly. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation or recompilation would outweigh the overhead of compiling that code.

Contents

JIT compilation is a combination of the two traditional approaches to translation to machine code ahead-of-time compilation (AOT), and interpretation and combines some advantages and drawbacks of both. [2] Roughly, JIT compilation combines the speed of compiled code with the flexibility of interpretation, with the overhead of an interpreter and the additional overhead of compiling and linking (not just interpreting). JIT compilation is a form of dynamic compilation, and allows adaptive optimization such as dynamic recompilation and microarchitecture-specific speedups. [nb 1] [3] Interpretation and JIT compilation are particularly suited for dynamic programming languages, as the runtime system can handle late-bound data types and enforce security guarantees.

History

The earliest published JIT compiler is generally attributed to work on LISP by John McCarthy in 1960. [4] In his seminal paper Recursive functions of symbolic expressions and their computation by machine, Part I, he mentions functions that are translated during runtime, thereby sparing the need to save the compiler output to punch cards [5] (although this would be more accurately known as a "Compile and go system"). Another early example was by Ken Thompson, who in 1968 gave one of the first applications of regular expressions, here for pattern matching in the text editor QED. [6] For speed, Thompson implemented regular expression matching by JITing to IBM 7094 code on the Compatible Time-Sharing System. [4] An influential technique for deriving compiled code from interpretation was pioneered by James G. Mitchell in 1970, which he implemented for the experimental language LC². [7] [8]

Smalltalk (c. 1983) pioneered new aspects of JIT compilations. For example, translation to machine code was done on demand, and the result was cached for later use. When memory became scarce, the system would delete some of this code and regenerate it when it was needed again. [2] [9] Sun's Self language improved these techniques extensively and was at one point the fastest Smalltalk system in the world, achieving up to half the speed of optimized C [10] but with a fully object-oriented language.

Self was abandoned by Sun, but the research went into the Java language. The term "Just-in-time compilation" was borrowed from the manufacturing term "Just in time" and popularized by Java, with James Gosling using the term from 1993. [11] Currently JITing is used by most implementations of the Java Virtual Machine, as HotSpot builds on, and extensively uses, this research base.

The HP project Dynamo was an experimental JIT compiler where the "bytecode" format and the machine code format were the same; the system optimized PA-8000 machine code. [12] Counterintuitively, this resulted in speed ups, in some cases of 30% since doing this permitted optimizations at the machine code level, for example, inlining code for better cache usage and optimizations of calls to dynamic libraries and many other run-time optimizations which conventional compilers are not able to attempt. [13] [14]

In November 2020, PHP 8.0 introduced a JIT compiler. [15]

Design

In a bytecode-compiled system, source code is translated to an intermediate representation known as bytecode. Bytecode is not the machine code for any particular computer, and may be portable among computer architectures. The bytecode may then be interpreted by, or run on a virtual machine. The JIT compiler reads the bytecodes in many sections (or in full, rarely) and compiles them dynamically into machine code so the program can run faster. This can be done per-file, per-function or even on any arbitrary code fragment; the code can be compiled when it is about to be executed (hence the name "just-in-time"), and then cached and reused later without needing to be recompiled.

By contrast, a traditional interpreted virtual machine will simply interpret the bytecode, generally with much lower performance. Some interpreters even interpret source code, without the step of first compiling to bytecode, with even worse performance. Statically-compiled code or native code is compiled prior to deployment. A dynamic compilation environment is one in which the compiler can be used during execution. A common goal of using JIT techniques is to reach or surpass the performance of static compilation, while maintaining the advantages of bytecode interpretation: Much of the "heavy lifting" of parsing the original source code and performing basic optimization is often handled at compile time, prior to deployment: compilation from bytecode to machine code is much faster than compiling from source. The deployed bytecode is portable, unlike native code. Since the runtime has control over the compilation, like interpreted bytecode, it can run in a secure sandbox. Compilers from bytecode to machine code are easier to write, because the portable bytecode compiler has already done much of the work.

JIT code generally offers far better performance than interpreters. In addition, it can in some cases offer better performance than static compilation, as many optimizations are only feasible at run-time: [16] [17]

  1. The compilation can be optimized to the targeted CPU and the operating system model where the application runs. For example, JIT can choose SSE2 vector CPU instructions when it detects that the CPU supports them. To obtain this level of optimization specificity with a static compiler, one must either compile a binary for each intended platform/architecture, or else include multiple versions of portions of the code within a single binary.
  2. The system is able to collect statistics about how the program is actually running in the environment it is in, and it can rearrange and recompile for optimum performance. However, some static compilers can also take profile information as input.
  3. The system can do global code optimizations (e.g. inlining of library functions) without losing the advantages of dynamic linking and without the overheads inherent to static compilers and linkers. Specifically, when doing global inline substitutions, a static compilation process may need run-time checks and ensure that a virtual call would occur if the actual class of the object overrides the inlined method, and boundary condition checks on array accesses may need to be processed within loops. With just-in-time compilation in many cases this processing can be moved out of loops, often giving large increases of speed.
  4. Although this is possible with statically compiled garbage collected languages, a bytecode system can more easily rearrange executed code for better cache utilization.

Because a JIT must render and execute a native binary image at runtime, true machine-code JITs necessitate platforms that allow for data to be executed at runtime, making using such JITs on a Harvard architecture-based machine impossible; the same can be said for certain operating systems and virtual machines as well. However, a special type of "JIT" may potentially not target the physical machine's CPU architecture, but rather an optimized VM bytecode where limitations on raw machine code prevail, especially where that bytecode's VM eventually leverages a JIT to native code. [18]

Performance

JIT causes a slight to noticeable delay in the initial execution of an application, due to the time taken to load and compile the input code. Sometimes this delay is called "startup time delay" or "warm-up time". In general, the more optimization JIT performs, the better the code it will generate, but the initial delay will also increase. A JIT compiler therefore has to make a trade-off between the compilation time and the quality of the code it hopes to generate. Startup time can include increased IO-bound operations in addition to JIT compilation: for example, the rt.jar class data file for the Java Virtual Machine (JVM) is 40 MB and the JVM must seek a lot of data in this contextually huge file. [19]

One possible optimization, used by Sun's HotSpot Java Virtual Machine, is to combine interpretation and JIT compilation. The application code is initially interpreted, but the JVM monitors which sequences of bytecode are frequently executed and translates them to machine code for direct execution on the hardware. For bytecode which is executed only a few times, this saves the compilation time and reduces the initial latency; for frequently executed bytecode, JIT compilation is used to run at high speed, after an initial phase of slow interpretation. Additionally, since a program spends most time executing a minority of its code, the reduced compilation time is significant. Finally, during the initial code interpretation, execution statistics can be collected before compilation, which helps to perform better optimization. [20]

The correct tradeoff can vary due to circumstances. For example, Sun's Java Virtual Machine has two major modes—client and server. In client mode, minimal compilation and optimization is performed, to reduce startup time. In server mode, extensive compilation and optimization is performed, to maximize performance once the application is running by sacrificing startup time. Other Java just-in-time compilers have used a runtime measurement of the number of times a method has executed combined with the bytecode size of a method as a heuristic to decide when to compile. [21] Still another uses the number of times executed combined with the detection of loops. [22] In general, it is much harder to accurately predict which methods to optimize in short-running applications than in long-running ones. [23]

Native Image Generator (Ngen) by Microsoft is another approach at reducing the initial delay. [24] Ngen pre-compiles (or "pre-JITs") bytecode in a Common Intermediate Language image into machine native code. As a result, no runtime compilation is needed. .NET Framework 2.0 shipped with Visual Studio 2005 runs Ngen on all of the Microsoft library DLLs right after the installation. Pre-jitting provides a way to improve the startup time. However, the quality of code it generates might not be as good as the one that is JITed, for the same reasons why code compiled statically, without profile-guided optimization, cannot be as good as JIT compiled code in the extreme case: the lack of profiling data to drive, for instance, inline caching. [25]

There also exist Java implementations that combine an AOT (ahead-of-time) compiler with either a JIT compiler (Excelsior JET) or interpreter (GNU Compiler for Java).

JIT compilation may not reliably achieve its goal, namely entering a steady state of improved performance after a short initial warmup period. [26] [27] Across eight different virtual machines, Barrett et al. (2017) measured six widely-used microbenchmarks which are commonly used by virtual machine implementors as optimisation targets, running them repeatedly within a single process execution. [28] On Linux, they found that 8.7% to 9.6% of process executions failed to reach a steady state of performance, 16.7% to 17.9% entered a steady state of reduced performance after a warmup period, and 56.5% pairings of a specific virtual machine running a specific benchmark failed to consistently see a steady-state non-degradation of performance across multiple executions (i.e., at least one execution failed to reach a steady state or saw reduced performance in the steady state). Even where an improved steady-state was reached, it sometimes took many hundreds of iterations. [29] Traini et al. (2022) instead focused on the HotSpot virtual machine but with a much wider array of benchmarks, [30] finding that 10.9% of process executions failed to reach a steady state of performance, and 43.5% of benchmarks did not consistently attain a steady state across multiple executions. [31]

Security

JIT compilation fundamentally uses executable data, and thus poses security challenges and possible exploits.

Implementation of JIT compilation consists of compiling source code or byte code to machine code and executing it. This is generally done directly in memory: the JIT compiler outputs the machine code directly into memory and immediately executes it, rather than outputting it to disk and then invoking the code as a separate program, as in usual ahead of time compilation. In modern architectures this runs into a problem due to executable space protection: arbitrary memory cannot be executed, as otherwise there is a potential security hole. Thus memory must be marked as executable; for security reasons this should be done after the code has been written to memory, and marked read-only, as writable/executable memory is a security hole (see W^X). [32] For instance Firefox's JIT compiler for Javascript introduced this protection in a release version with Firefox 46. [33]

JIT spraying is a class of computer security exploits that use JIT compilation for heap spraying: the resulting memory is then executable, which allows an exploit if execution can be moved into the heap.

Uses

JIT compilation can be applied to some programs, or can be used for certain capacities, particularly dynamic capacities such as regular expressions. For example, a text editor may compile a regular expression provided at runtime to machine code to allow faster matching: this cannot be done ahead of time, as the pattern is only provided at runtime. Several modern runtime environments rely on JIT compilation for high-speed code execution, including most implementations of Java, together with Microsoft's .NET. Similarly, many regular-expression libraries feature JIT compilation of regular expressions, either to bytecode or to machine code. JIT compilation is also used in some emulators, in order to translate machine code from one CPU architecture to another.

A common implementation of JIT compilation is to first have AOT compilation to bytecode (virtual machine code), known as bytecode compilation, and then have JIT compilation to machine code (dynamic compilation), rather than interpretation of the bytecode. This improves the runtime performance compared to interpretation, at the cost of lag due to compilation. JIT compilers translate continuously, as with interpreters, but caching of compiled code minimizes lag on future execution of the same code during a given run. Since only part of the program is compiled, there is significantly less lag than if the entire program were compiled prior to execution.

See also

Notes

  1. Ahead-of-Time compilers can target specific microarchitectures as well, but the difference between AOT and JIT in that matter is one of portability. A JIT can render code tailored to the currently running CPU at runtime, whereas an AOT, in lieu of optimizing for a generalized subset of uarches, must know the target CPU in advance: such code may not only be not performant on other CPU types but may be outright unstable.

Related Research Articles

<span class="mw-page-title-main">Java virtual machine</span> Virtual machine that runs Java programs

A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes what is required in a JVM implementation. Having a specification ensures interoperability of Java programs across different implementations so that program authors using the Java Development Kit (JDK) need not worry about idiosyncrasies of the underlying hardware platform.

Common Intermediate Language (CIL), formerly called Microsoft Intermediate Language (MSIL) or Intermediate Language (IL), is the intermediate language binary instruction set defined within the Common Language Infrastructure (CLI) specification. CIL instructions are executed by a CIL-compatible runtime environment such as the Common Language Runtime. Languages which target the CLI compile to CIL. CIL is object-oriented, stack-based bytecode. Runtimes typically just-in-time compile CIL instructions into native code.

<span class="mw-page-title-main">Interpreter (computing)</span> Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter's virtual machine.

Bytecode is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references that encode the result of compiler parsing and performing semantic analysis of things like type, scope, and nesting depths of program objects.

In computer science, dynamic recompilation is a feature of some emulators and virtual machines, where the system may recompile some part of a program during execution. By compiling during execution, the system can tailor the generated code to reflect the program's run-time environment, and potentially produce more efficient code by exploiting information that is not available to a traditional static compiler.

Execution in computer and software engineering is the process by which a computer or virtual machine interprets and acts on the instructions of a computer program. Each instruction of a program is a description of a particular action which must be carried out, in order for a specific problem to be solved. Execution involves repeatedly following a "fetch–decode–execute" cycle for each instruction done by the control unit. As the executing machine follows the instructions, specific effects are produced in accordance with the semantics of those instructions.

HotSpot, released as Java HotSpot Performance Engine, is a Java virtual machine for desktop and server computers, developed by Sun Microsystems which was purchased by and became a division of Oracle Corporation in 2010. Its features improved performance via methods such as just-in-time compilation and adaptive optimization. It is the de facto Java Virtual Machine, serving as the reference implementation of the Java programming language.

Jazelle DBX is an extension that allows some ARM processors to execute Java bytecode in hardware as a third execution state alongside the existing ARM and Thumb modes. Jazelle functionality was specified in the ARMv5TEJ architecture and the first processor with Jazelle technology was the ARM926EJ-S. Jazelle is denoted by a "J" appended to the CPU name, except for post-v5 cores where it is required for architecture conformance.

Application virtualization software refers to both application virtual machines and software responsible for implementing them. Application virtual machines are typically used to allow application bytecode to run portably on many different computer architectures and operating systems. The application is usually run on the computer using an interpreter or just-in-time compilation (JIT). There are often several implementations of a given virtual machine, each covering a different set of functions.

In computer programming, a programming language implementation is a system for executing computer programs. There are two general approaches to programming language implementation:

In software development, the programming language Java was historically considered slower than the fastest third-generation typed languages such as C and C++. In contrast to those languages, Java compiles by default to a Java Virtual Machine (JVM) with operations distinct from those of the actual computer hardware. Early JVM implementations were interpreters; they simulated the virtual operations one-by-one rather than translating them into machine code for direct hardware execution.

In computer science, ahead-of-time compilation is the act of compiling an (often) higher-level programming language into an (often) lower-level language before execution of a program, usually at build-time, to reduce the amount of work needed to be performed at run time.

Eclipse OpenJ9 is a high performance, scalable, Java virtual machine (JVM) implementation that is fully compliant with the Java Virtual Machine Specification.

Dalvik is a discontinued process virtual machine (VM) in the Android operating system that executes applications written for Android. Dalvik was an integral part of the Android software stack in the Android versions 4.4 "KitKat" and earlier, which were commonly used on mobile devices such as mobile phones and tablet computers, and more in some devices such as smart TVs and wearables. Dalvik is open-source software, originally written by Dan Bornstein, who named it after the fishing village of Dalvík in Eyjafjörður, Iceland.

Profile-guided optimization, also known as profile-directed feedback (PDF), and feedback-directed optimization (FDO) is a compiler optimization technique in computer programming that uses profiling to improve program runtime performance.

Tracing just-in-time compilation is a technique used by virtual machines to optimize the execution of a program at runtime. This is done by recording a linear sequence of frequently executed operations, compiling them to native machine code and executing them. This is opposed to traditional just-in-time (JIT) compilers that work on a per-method basis.

Java bytecode is the instruction set of the Java virtual machine (JVM), the language to which Java and other JVM-compatible source code is compiled. Each instruction is represented by a single byte, hence the name bytecode, making it a compact form of data.

<span class="mw-page-title-main">GraalVM</span> Virtual machine software

GraalVM is a Java Development Kit (JDK) written in Java. The open-source distribution of GraalVM is based on OpenJDK, and the enterprise distribution is based on Oracle JDK. As well as just-in-time (JIT) compilation, GraalVM can compile a Java application ahead of time. This allows for faster initialization, greater runtime performance, and decreased resource consumption, but the resulting executable can only run on the platform it was compiled for.

HipHop Virtual Machine (HHVM) is an open-source virtual machine based on just-in-time (JIT) compilation that serves as an execution engine for the Hack programming language. By using the principle of JIT compilation, Hack code is first transformed into intermediate HipHop bytecode (HHBC), which is then dynamically translated into x86-64 machine code, optimized, and natively executed. This contrasts with PHP's usual interpreted execution, in which the Zend Engine transforms PHP source code into opcodes that serve as a form of bytecode, and executes the opcodes directly on the Zend Engine's virtual CPU.

Android Runtime (ART) is an application runtime environment used by the Android operating system. Replacing Dalvik, the process virtual machine originally used by Android, ART performs the translation of the application's bytecode into native instructions that are later executed by the device's runtime environment.

References

  1. Languages, Compilers, and Runtime Systems, University of Michigan, Computer Science and Engineering, retrieved March 15, 2018
  2. 1 2 3 Aycock 2003.
  3. "Does the JIT take advantage of my CPU?". David Notario's WebLog. Retrieved 2018-12-03.
  4. 1 2 Aycock 2003, 2. JIT Compilation Techniques, 2.1 Genesis, p. 98.
  5. McCarthy, J. (April 1960). "Recursive functions of symbolic expressions and their computation by machine, Part I". Communications of the ACM. 3 (4): 184–195. CiteSeerX   10.1.1.111.8833 . doi:10.1145/367177.367199. S2CID   1489409.
  6. Thompson 1968.
  7. Aycock 2003, 2. JIT Compilation Techniques, 2.2 LC², p. 98–99.
  8. Mitchell, J.G. (1970). "The design and construction of flexible and efficient interactive programming systems".{{cite journal}}: Cite journal requires |journal= (help)
  9. Deutsch, L.P.; Schiffman, A.M. (1984). "Efficient implementation of the smalltalk-80 system" (PDF). Proceedings of the 11th ACM SIGACT-SIGPLAN symposium on Principles of programming languages - POPL '84. pp. 297–302. doi:10.1145/800017.800542. ISBN   0-89791-125-3. S2CID   3045432. Archived from the original (PDF) on 2004-06-18.
  10. "97-pep.ps". research.sun.com. Archived from the original on 24 November 2006. Retrieved 15 January 2022.
  11. Aycock 2003, 2.14 Java, p. 107, footnote 13.
  12. "Dynamo: A Transparent Dynamic Optimization System". Vasanth Bala, Evelyn Duesterwald, Sanjeev Banerjia. PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation. pages 1 to 12. DOI 10.1145/349299.349303. Retrieved March 28, 2012
  13. John Jannotti. "HP's Dynamo". Ars Technica. Retrieved 2013-07-05.
  14. "The HP Dynamo Project". Archived from the original on October 19, 2002. Retrieved 2016-04-12.{{cite web}}: CS1 maint: unfit URL (link)
  15. Tung, Liam (27 November 2020). "Programming language PHP 8 is out: This new JIT compiler points to better performance". ZDNet . Retrieved 28 November 2020.
  16. Croce, Louis. "Just in Time Compilation" (PDF). Columbia University. Archived from the original (PDF) on 2018-05-03.
  17. "What are the advantages of JIT vs. AOT compilation". Stack Overflow. Jan 21, 2010.
  18. "Compile a JIT based lang to Webassembly". Stack Overflow. Retrieved 2018-12-04.
  19. Haase, Chet (May 2007). "Consumer JRE: Leaner, Meaner Java Technology". Sun Microsystems. Retrieved 2007-07-27.
  20. "The Java HotSpot Performance Engine Architecture". Oracle.com. Retrieved 2013-07-05.
  21. Schilling, Jonathan L. (February 2003). "The simplest heuristics may be the best in Java JIT compilers" (PDF). SIGPLAN Notices . 38 (2): 36–46. doi:10.1145/772970.772975. S2CID   15117148. Archived from the original (PDF) on 2015-09-24.
  22. Toshio Suganuma, Toshiaki Yasue, Motohiro Kawahito, Hideaki Komatsu, Toshio Nakatani, "A dynamic optimization framework for a Java just-in-time compiler", Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA '01), pp. 180–195, October 14–18, 2001.
  23. Matthew Arnold, Michael Hind, Barbara G. Ryder, "An Empirical Study of Selective Optimization", Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers, pp. 49–67, August 10–12, 2000.
  24. "Native Image Generator (Ngen.exe)". Msdn2.microsoft.com. 5 December 2006. Retrieved 2013-07-05.
  25. Sweeney, Arnold (February 2005). "A Survey of Adaptive Optimization in Virtual Machines" (PDF). Proceedings of the IEEE . 92 (2): 449–466. Archived from the original (PDF) on 2016-06-29.
  26. Barrett et al. 2017, p. 3.
  27. Traini et al. 2022, p. 1.
  28. Barrett et al. 2017, p. 5-6.
  29. Barrett et al. 2017, p. 12-13.
  30. Traini et al. 2022, p. 17-23.
  31. Traini et al. 2022, p. 26-29.
  32. "How to JIT – an introduction", Eli Bendersky, November 5th, 2013 at 5:59 am
  33. De Mooij, Jan. "W^X JIT-code enabled in Firefox". Jan De Mooij. Retrieved 11 May 2016.

Bibliography