Programming language implementation

Last updated

In computer programming, a programming language implementation is a system for executing computer programs. There are two general approaches to programming language implementation: [1]

Contents

Interpreter

An interpreter is composed of two parts: a parser and an evaluator. After a program is read as input by an interpreter, it is processed by the parser. The parser breaks the program into language components to form a parse tree. The evaluator then uses the parse tree to execute the program. [3]

Virtual machine

A virtual machine is a special type of interpreter that interprets bytecode. [2] Bytecode is a portable low-level code similar to machine code, though it is generally executed on a virtual machine instead of a physical machine. [4] To improve their efficiencies, many programming languages such as Java, [4] Python, [5] and C# [6] are compiled to bytecode before being interpreted.

Just-in-time compiler

Some virtual machines include a just-in-time (JIT) compiler to improve the efficiency of bytecode execution. While the bytecode is being executed by the virtual machine, if the JIT compiler determines that a portion of the bytecode will be used repeatedly, it compiles that particular portion to machine code. The JIT compiler then stores the machine code in memory so that it can be used by the virtual machine. JIT compilers try to strike a balance between longer compilation time and faster execution time. [2]

Compiler

A compiler translates a program written in one language into another language. Most compilers are organized into three stages: a front end, an optimizer, and a back end. The front end is responsible for understanding the program. It makes sure the program is valid and transforms it into an intermediate representation, a data structure used by the compiler to represent the program. The optimizer improves the intermediate representation to increase the speed or reduce the size of the executable which is ultimately produced by the compiler. The back end converts the optimized intermediate representation into the output language of the compiler. [7]

If a compiler of a given high level language produces another high level language, it is called a transpiler. Transpilers can be used to extend existing languages or to simplify compiler development by exploiting portable and well-optimized implementations of other languages (such as C). [2]

Many combinations of interpretation and compilation are possible, and many modern programming language implementations include elements of both. For example, the Smalltalk programming language is conventionally implemented by compilation into bytecode, which is then either interpreted or compiled by a virtual machine. Since Smalltalk bytecode is run on a virtual machine, it is portable across different hardware platforms. [8]

Multiple implementations

Programming languages can have multiple implementations. Different implementations can be written in different languages and can use different methods to compile or interpret code. For example, implementations of Python include: [9]

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

<span class="mw-page-title-main">Java virtual machine</span> Virtual machine that runs Java programs

A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes what is required in a JVM implementation. Having a specification ensures interoperability of Java programs across different implementations so that program authors using the Java Development Kit (JDK) need not worry about idiosyncrasies of the underlying hardware platform.

In computer programming, a p-code machine is a virtual machine designed to execute p-code. This term is applied both generically to all such machines, and to specific implementations, the most famous being the p-Machine of the Pascal-P system, particularly the UCSD Pascal implementation, among whose developers, the p in p-code was construed to mean pseudo more often than portable, thus pseudo-code meaning instructions for a pseudo-machine.

<span class="mw-page-title-main">Interpreter (computing)</span> Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter's Virtual Machine.

Parrot is a discontinued register-based process virtual machine designed to run dynamic languages efficiently. It is possible to compile Parrot assembly language and Parrot intermediate representation to Parrot bytecode and execute it. Parrot is free and open-source software.

Bytecode is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references that encode the result of compiler parsing and performing semantic analysis of things like type, scope, and nesting depths of program objects.

A compiled language is a programming language whose implementations are typically compilers, and not interpreters.

In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language elements, be easier to use, or may automate significant areas of computing systems, making the process of developing a program simpler and more understandable than when using a lower-level language. The amount of abstraction provided defines how "high-level" a programming language is.

In computing, just-in-time (JIT) compilation is compilation during execution of a program rather than before execution. This may consist of source code translation but is more commonly bytecode translation to machine code, which is then executed directly. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation or recompilation would outweigh the overhead of compiling that code.

An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" IR must be accurate – capable of representing the source code without loss of information – and independent of any particular source or target language. An IR may take one of several forms: an in-memory data structure, or a special tuple- or stack-based code readable by the program. In the latter case it is also called an intermediate language.

HotSpot, released as Java HotSpot Performance Engine, is a Java virtual machine for desktop and server computers, developed by Sun Microsystems and now maintained and distributed by Oracle Corporation. It features improved performance via methods such as just-in-time compilation and adaptive optimization. It is the de facto Java Virtual Machine, serving as the reference implementation of the Java programming language.

Application virtualization software refers to both application virtual machines and software responsible for implementing them. Application virtual machines are typically used to allow application bytecode to run portably on many different computer architectures and operating systems. The application is usually run on the computer using an interpreter or just-in-time compilation (JIT). There are often several implementations of a given virtual machine, each covering a different set of functions.

In computer science, ahead-of-time compilation is the act of compiling an (often) higher-level programming language into an (often) lower-level language before execution of a program, usually at build-time, to reduce the amount of work needed to be performed at run time.

Dalvik is a discontinued process virtual machine (VM) in the Android operating system that executes applications written for Android. Dalvik was an integral part of the Android software stack in the Android versions 4.4 "KitKat" and earlier, which were commonly used on mobile devices such as mobile phones and tablet computers, and more in some devices such as smart TVs and wearables. Dalvik is open-source software, originally written by Dan Bornstein, who named it after the fishing village of Dalvík in Eyjafjörður, Iceland.

Tracing just-in-time compilation is a technique used by virtual machines to optimize the execution of a program at runtime. This is done by recording a linear sequence of frequently executed operations, compiling them to native machine code and executing them. This is opposed to traditional just-in-time (JIT) compilers that work on a per-method basis.

Java bytecode is the instruction set of the Java virtual machine (JVM), crucial for executing programs written in the Java language and other JVM-compatible languages. Each bytecode operation in the JVM is represented by a single byte, hence the name "bytecode", making it a compact form of instruction. This intermediate form enables Java programs to be platform-independent, as they are compiled not to native machine code but to a universally executable format across different JVM implementations.

<span class="mw-page-title-main">GraalVM</span> Virtual machine software

GraalVM is a Java Development Kit (JDK) written in Java. The open-source distribution of GraalVM is based on OpenJDK, and the enterprise distribution is based on Oracle JDK. As well as just-in-time (JIT) compilation, GraalVM can compile a Java application ahead of time. This allows for faster initialization, greater runtime performance, and decreased resource consumption, but the resulting executable can only run on the platform it was compiled for. It provides additional programming languages and execution modes. The first production-ready release, GraalVM 19.0, was distributed in May 2019. The most recent release is GraalVM for JDK 22, made available in March 2024.

HipHop Virtual Machine (HHVM) is an open-source virtual machine based on just-in-time (JIT) compilation that serves as an execution engine for the Hack programming language. By using the principle of JIT compilation, Hack code is first transformed into intermediate HipHop bytecode (HHBC), which is then dynamically translated into x86-64 machine code, optimized, and natively executed. This contrasts with PHP's usual interpreted execution, in which the Zend Engine transforms PHP source code into opcodes that serve as a form of bytecode, and executes the opcodes directly on the Zend Engine's virtual CPU.

A high-level language computer architecture (HLLCA) is a computer architecture designed to be targeted by a specific high-level programming language (HLL), rather than the architecture being dictated by hardware considerations. It is accordingly also termed language-directed computer design, coined in McKeeman (1967) and primarily used in the 1960s and 1970s. HLLCAs were popular in the 1960s and 1970s, but largely disappeared in the 1980s. This followed the dramatic failure of the Intel 432 (1981) and the emergence of optimizing compilers and reduced instruction set computer (RISC) architectures and RISC-like complex instruction set computer (CISC) architectures, and the later development of just-in-time compilation (JIT) for HLLs. A detailed survey and critique can be found in Ditzel & Patterson (1980).

References

  1. Ranta, Aarne (February 6, 2012). Implementing Programming Languages (PDF). College Publications. pp. 16–18. ISBN   9781848900646. Archived (PDF) from the original on Nov 7, 2020. Retrieved 22 March 2020.
  2. 1 2 3 4 5 Baker, Greg. "Language Implementations". Computing Science - Simon Fraser University. Archived from the original on Mar 8, 2019. Retrieved 22 March 2020.
  3. Evans, David (19 August 2011). Introduction to Computing (PDF). University of Virginia. p. 211. Retrieved 22 March 2020.
  4. 1 2 Sridhar, Jay (Aug 29, 2017). "Why the Java Virtual Machine Helps Your Code Run Better". MakeUseOf. Retrieved 22 March 2020.
  5. Bennett, James (April 23, 2018). "An introduction to Python bytecode". Opensource.com. Retrieved 22 March 2020.
  6. Ali, Mirza Farrukh (Oct 12, 2017). "Common Language Runtime(CLR) DotNet". Medium. Retrieved 22 March 2020.
  7. Cooper, Keith; Torczon, Linda (7 February 2011). Engineering a Compiler (2nd ed.). Morgan Kaufmann. pp.  6-9. ISBN   9780120884780.
  8. Lewis, Simon (May 11, 1995). The Art and Science of Smalltalk (PDF). Prentice Hall. pp. 20–21. ISBN   9780133713459 . Retrieved 23 March 2020.
  9. "Alternative Python Implementations". Python.org. Retrieved 23 March 2020.