P-code machine

Last updated

In computer programming, a p-code machine (portable code machine [1] ) is a virtual machine designed to execute p-code (the assembly language or machine code of a hypothetical central processing unit (CPU)). This term is applied both generically to all such machines (such as the Java virtual machine (JVM) and MATLAB pre-compiled code), and to specific implementations, the most famous being the p-Machine of the Pascal-P system, particularly the UCSD Pascal implementation, among whose developers, the p in p-code was construed to mean pseudo more often than portable, thus pseudo-code meaning instructions for a pseudo-machine.

Contents

Although the concept was first implemented circa 1966 as O-code for the Basic Combined Programming Language (BCPL) and P code for the language Euler, [2] the term p-code first appeared in the early 1970s. Two early compilers generating p-code were the Pascal-P compiler in 1973, by Kesav V. Nori, Urs Ammann, Kathleen Jensen, Hans-Heinrich Nägeli, and Christian Jacobi, [3] and the Pascal-S compiler in 1975, by Niklaus Wirth.

Programs that have been translated to p-code can either be interpreted by a software program that emulates the behavior of the hypothetical CPU, or translated into the machine code of the CPU on which the program is to run and then executed. If there is sufficient commercial interest, a hardware implementation of the CPU specification may be built (e.g., the Pascal MicroEngine or a version of a Java processor).

Benefits and weaknesses of implementing p-code

Compared to direct translation into native machine code, a two-stage approach involving translation into p-code and execution by interpreting or just-in-time compilation (JIT) offers several advantages.

One of the significant disadvantages of p-code is execution speed, which can sometimes be remedied via JIT compiling. P-code is often also easier to reverse-engineer than native code.

Implementations of p-code

In the early 1980s, at least two operating systems achieved machine independence through extensive use of p-code. The Business Operating System (BOS) was a cross-platform operating system designed to run p-code programs exclusively. The UCSD p-System, developed at The University of California, San Diego, was a self-compiling and self-hosting [ clarification needed ] operating system based on p-code optimized for generation by the Pascal language.

In the 1990s, translation into p-code became a popular strategy for implementations of languages such as Python, Microsoft P-Code in Visual Basic, and Java bytecode in Java. [4] [ failed verification ]

The language Go uses a generic, portable assembly as a form of p-code, implemented by Ken Thompson as an extension of the work on Plan 9 from Bell Labs. Unlike Common Language Runtime (CLR) bytecode or JVM bytecode, there is no stable specification, and the Go build tools do not emit a bytecode format to be used at a later time. The Go assembler uses the generic assembly language as an intermediate representation, and Go executables are machine-specific statically linked binaries. [5]

UCSD p-Machine

Architecture

Like many other p-code machines, the UCSD p-Machine is a stack machine, which means that most instructions take their operands from a stack, and place results back on the stack. Thus, the add instruction replaces the two topmost elements of the stack with their sum. A few instructions take an immediate argument. Like Pascal, the p-code is strongly typed, supporting boolean (b), character (c), integer (i), real (r), set (s), and pointer (a) data types natively.

Some simple instructions:

Insn.   Stack   Stack   Description         before  after   adi     i1 i2   i1+i2   add two integers adr     r1 r2   r1+r2   add two reals inn     i1 s1   b1      set membership; b1 = whether i1 is a member of s1 ldi     i1 i1   i1      load integer constant mov     a1 a2   a2      move not     b1 b1   -b1     boolean negation

Environment

Similar to a real target CPU, the p-System has only one stack shared by procedure stack frames (providing return address, etc.) and the arguments to local instructions. Three of the machine's registers point into the stack (which grows upwards):

Also present is a constant area, and, below that, the heap growing down towards the stack. The NP (the new pointer) register points to the top (lowest used address) of the heap. When EP gets greater than NP, the machine's memory is exhausted.

The fifth register, PC, points at the current instruction in the code area.

Calling conventions

Stack frames look like this:

EP ->       local stack SP -> ...       locals       ...       parameters       ...       return address (previous PC)       previous EP       dynamic link (previous MP)       static link (MP of surrounding procedure) MP -> function return value

The procedure calling sequence works as follows: The call is introduced with

 mst n

where n specifies the difference in nesting levels (remember that Pascal supports nested procedures). This instruction will mark the stack, i.e. reserve the first five cells of the above stack frame, and initialise previous EP, dynamic, and static link. The caller then computes and pushes any parameters for the procedure, and then issues

 cup n, p

to call a user procedure (n being the number of parameters, p the procedure's address). This will save the PC in the return address cell, and set the procedure's address as the new PC.

User procedures begin with the two instructions

 ent 1, i  ent 2, j

The first sets SP to MP + i, the second sets EP to SP + j. So i essentially specifies the space reserved for locals (plus the number of parameters plus 5), and j gives the number of entries needed locally for the stack. Memory exhaustion is checked at this point.

Returning to the caller is accomplished via

 retC

with C giving the return type (i, r, c, b, a as above, and p for no return value). The return value has to be stored in the appropriate cell previously. On all types except p, returning will leave this value on the stack.

Instead of calling a user procedure (cup), standard procedure q can be called with

 csp q

These standard procedures are Pascal procedures like readln() (csp rln), sin() (csp sin), etc. Peculiarly eof() is a p-Code instruction instead.

Example machine

Niklaus Wirth specified a simple p-code machine in the 1976 book Algorithms + Data Structures = Programs . The machine had 3 registers - a program counter p, a base register b, and a top-of-stack register t. There were 8 instructions:

  1. lit 0, a : load constant a
  2. opr 0, a : execute operation a (13 operations: RETURN, 5 math functions, and 7 comparison functions)
  3. lodl, a : load variable l, a
  4. stol, a : store variable l, a
  5. call, a : call procedure a at level l
  6. int 0, a : increment t-register by a
  7. jmp 0, a : jump to a
  8. jpc 0, a : jump conditional to a [6]

This is the code for the machine, written in Pascal:

constamax=2047;{maximum address}levmax=3;{maximum depth of block nesting}cxmax=200;{size of code array}typefct=(lit,opr,lod,sto,cal,int,jmp,jpc);instruction=packedrecordf:fct;l:0..levmax;a:0..amax;end;varcode:array[0..cxmax]ofinstruction;procedureinterpret;conststacksize=500;varp,b,t:integer;{program-, base-, topstack-registers}i:instruction;{instruction register}s:array[1..stacksize]ofinteger;{datastore}functionbase(l:integer):integer;varb1:integer;beginb1:=b;{find base l levels down}whilel>0dobeginb1:=s[b1];l:=l-1end;base:=b1end{base};beginwriteln(' start pl/0');t:=0;b:=1;p:=0;s[1]:=0;s[2]:=0;s[3]:=0;repeati:=code[p];p:=p+1;withidocasefoflit:begint:=t+1;s[t]:=aend;opr:caseaof{operator}0:begin{return}t:=b-1;p:=s[t+3];b:=s[t+2];end;1:s[t]:=-s[t];2:begint:=t-1;s[t]:=s[t]+s[t+1]end;3:begint:=t-1;s[t]:=s[t]-s[t+1]end;4:begint:=t-1;s[t]:=s[t]*s[t+1]end;5:begint:=t-1;s[t]:=s[t]divs[t+1]end;6:s[t]:=ord(odd(s[t]));8:begint:=t-1;s[t]:=ord(s[t]=s[t+1])end;9:begint:=t-1;s[t]:=ord(s[t]<>s[t+1])end;10:begint:=t-1;s[t]:=ord(s[t]<s[t+1])end;11:begint:=t-1;s[t]:=ord(s[t]>=s[t+1])end;12:begint:=t-1;s[t]:=ord(s[t]>s[t+1])end;13:begint:=t-1;s[t]:=ord(s[t]<=s[t+1])end;end;lod:begint:=t+1;s[t]:=s[base(l)+a]end;sto:begins[base(l)+a]:=s[t];writeln(s[t]);t:=t-1end;cal:begin{generate new block mark}s[t+1]:=base(l);s[t+2]:=b;s[t+3]:=p;b:=t+1;p:=aend;int:t:=t+a;jmp:p:=a;jpc:beginifs[t]=0thenp:=a;t:=t-1endend{with, case}untilp=0;writeln(' end pl/0');end{interpret};

This machine was used to run Wirth's PL/0, a Pascal subset compiler used to teach compiler development. [7] [ failed verification ]

Microsoft P-Code

P-Code is a name for several of Microsoft's proprietary intermediate languages. They provided an alternate binary format to machine code. At various times, Microsoft have said p-code is an abbreviation for either packed code [8] or pseudo code. [9]

Microsoft p-code was used in Visual C++ and Visual Basic. Like other p-code implementations, Microsoft p-code enabled a more compact executable at the expense of slower execution.

Other implementations

See also

Related Research Articles

<span class="mw-page-title-main">Oberon (programming language)</span> General-purpose programming language

Oberon is a general-purpose programming language first published in 1987 by Niklaus Wirth and the latest member of the Wirthian family of ALGOL-like languages. Oberon was the result of a concentrated effort to increase the power of Modula-2, the direct successor of Pascal, and simultaneously to reduce its complexity. Its principal new feature is the concept of type extension of record types. It permits constructing new data types on the basis of existing ones and to relate them, deviating from the dogma of strictly static typing of data. Type extension is Wirth's way of inheritance reflecting the viewpoint of the parent site. Oberon was developed as part of the implementation of an operating system, also named Oberon at ETH Zurich in Switzerland. The name was inspired both by the Voyager space probe's pictures of the moon of the planet Uranus, named Oberon, and because Oberon is famous as the king of the elfs.

Pascal is an imperative and procedural programming language, designed by Niklaus Wirth as a small, efficient language intended to encourage good programming practices using structured programming and data structuring. It is named after French mathematician, philosopher and physicist Blaise Pascal.

<span class="mw-page-title-main">UCSD Pascal</span> Programming language released in 1977

UCSD Pascal is a Pascal programming language system that runs on the UCSD p-System, a portable, highly machine-independent operating system. UCSD Pascal was first released in 1977. It was developed at the University of California, San Diego (UCSD).

Parrot is a discontinued register-based process virtual machine designed to run dynamic languages efficiently. It is possible to compile Parrot assembly language and Parrot intermediate representation to Parrot bytecode and execute it. Parrot is free and open-source software.

Bytecode is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references that encode the result of compiler parsing and performing semantic analysis of things like type, scope, and nesting depths of program objects.

ALGOL W is a programming language. It is based on a proposal for ALGOL X by Niklaus Wirth and Tony Hoare as a successor to ALGOL 60. ALGOL W is a relatively simple upgrade of the original ALGOL 60, adding string, bitstring, complex number and reference to record data types and call-by-result passing of parameters, introducing the while statement, replacing switch with the case statement, and generally tightening up the language.

<span class="mw-page-title-main">Modula-3</span>

Modula-3 is a programming language conceived as a successor to an upgraded version of Modula-2 known as Modula-2+. While it has been influential in research circles it has not been adopted widely in industry. It was designed by Luca Cardelli, James Donahue, Lucille Glassman, Mick Jordan, Bill Kalsow and Greg Nelson at the Digital Equipment Corporation (DEC) Systems Research Center (SRC) and the Olivetti Research Center (ORC) in the late 1980s.

Component Pascal is a programming language in the tradition of Niklaus Wirth's Pascal, Modula-2, Oberon and Oberon-2. It bears the name of the language Pascal and preserves its heritage, but is incompatible with Pascal. Instead, it is a minor variant and refinement of Oberon-2 with a more expressive type system and built-in string support. Component Pascal was originally named Oberon/L, and was designed and supported by a small ETH Zürich spin-off company named Oberon microsystems. They developed an integrated development environment (IDE) named BlackBox Component Builder. Since 2014, development and support has been taken over by a small group of volunteers. The first version of the IDE was released in 1994, as Oberon/F. At the time, it presented a novel approach to graphical user interface (GUI) construction based on editable forms, where fields and command buttons are linked to exported variables and executable procedures. This approach bears some similarity to the code-behind way used in Microsoft's .NET 3.0 to access code in Extensible Application Markup Language (XAML), which was released in 2008.

<span class="mw-page-title-main">Oberon-2</span>

Oberon-2 is an extension of the original Oberon programming language that adds limited reflection and object-oriented programming facilities, open arrays as pointer base types, read-only field export, and reintroduces the FOR loop from Modula-2.

PL/0 is a programming language, intended as an educational programming language, that is similar to but much simpler than Pascal, a general-purpose programming language. It serves as an example of how to construct a compiler. It was originally introduced in the book, Algorithms + Data Structures = Programs, by Niklaus Wirth in 1976. It features quite limited language constructs: there are no real numbers, very few basic arithmetic operations and no control-flow constructs other than "if" and "while" blocks. While these limitations make writing real applications in this language impractical, it helps the compiler remain compact and simple.

In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down stack. In the case of a hardware processor, a hardware stack is used. The use of a stack significantly reduces the required number of processor registers. Stack machines extend push-down automata with additional load/store operations or multiple stacks and hence are Turing-complete.

Object Pascal is an extension to the programming language Pascal that provides object-oriented programming (OOP) features such as classes and methods.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

IP Pascal is an implementation of the Pascal programming language using the IP portability platform, a multiple machine, operating system and language implementation system. It implements the language "Pascaline", and has passed the Pascal Validation Suite.

S-algol is a computer programming language derivative of ALGOL 60 developed at the University of St Andrews in 1979 by Ron Morrison and Tony Davie. The language is a modification of ALGOL to contain orthogonal data types that Morrison created for his PhD thesis. Morrison would go on to become professor at the university and head of the department of computer science. The S-algol language was used for teaching at the university at an undergraduate level until 1999. It was also the language taught for several years in the 1980s at a local school in St. Andrews, Madras College. The computer science text Recursive Descent Compiling describes a recursive descent compiler for S-algol, implemented in S-algol.

Joel McCormack is an American computer scientist who designed the NCR Corporation version of the p-code machine, which is a kind of stack machine popular in the 1970s as the preferred way to implement new computing architectures and languages such as Pascal and BCPL. The NCR design shares no common architecture with the Pascal MicroEngine designed by Western Digital but both were meant to execute the UCSD p-System.[1,2]

Devised by Niklaus Wirth in the late 1960s and early 1970s, Pascal is a programming language. Originally produced by Borland Software Corporation, Embarcadero Delphi is composed of an IDE, set of standard libraries, and a Pascal-based language commonly called either Object Pascal, Delphi Pascal, or simply 'Delphi'. Since first released, it has become the most popular commercial Pascal implementation.

PL360 is a system programming language designed by Niklaus Wirth and written by Wirth, Joseph W. Wells Jr., and Edwin Satterthwaite Jr. for the IBM System/360 computer at Stanford University. A description of PL360 was published in early 1968, although the implementation was probably completed before Wirth left Stanford in 1967.

SuperPascal is an imperative, concurrent computing programming language developed by Per Brinch Hansen. It was designed as a publication language: a thinking tool to enable the clear and concise expression of concepts in parallel programming. This is in contrast with implementation languages which are often complicated with machine details and historical conventions. It was created to address the need at the time for a parallel publication language. Arguably, few languages today are expressive and concise enough to be used as thinking tools.

Java bytecode is the instruction set of the Java virtual machine (JVM), crucial for executing programs written in the Java language and other JVM-compatible languages. Each bytecode operation in the JVM is represented by a single byte, hence the name "bytecode", making it a compact form of instruction. This intermediate form enables Java programs to be platform-independent, as they are compiled not to native machine code but to a universally executable format across different JVM implementations.

References

  1. Upton, Eben; Duntemann, Jeffrey; Roberts, Ralph; Mamtora, Tim; Everard, Ben (2016-09-13). Learning Computer Architecture with Raspberry Pi. John Wiley & Sons. ISBN   978-1-119-18393-8.
  2. Wirth, Niklaus; Weber, Helmut (1966). "EULER: a generalization of ALGOL, and its formal definition: Part II". Communications of the ACM . New York, USA: Association for Computing Machinery (ACM). 9 (2): 89–99. doi: 10.1145/365170.365202 . S2CID   12124100.
  3. Nori, Kesav V.; Ammann, Urs; Jensen, Kathleen; Nägeli, Hans-Heinrich; Jacobi, Christian (1975). The Pascal P Compiler Implementation Notes. Zürich, Switzerland: Eidgenössische Technische Hochschule (ETH).
  4. "The p-code system".
  5. Pike, Robert C. (2016). "The Design of the Go Assembler". YouTube (Conference talk). Archived from the original on 2021-12-11. Retrieved 2017-08-25.
  6. "Category Archives: Wirth - Euler - Designed by Niklaus Wirth and Helmut Weber". Pascal for small machines - Wirth languages, Pascal, UCSD, Turbo, Delphi, Freepascal, Oberon. 2018-08-02.
  7. Alpert, Donald (September 1979). A Pascal P-Code Interpreter for the Stanford Emmy (PDF) (Report). Computer Systems Laboratory, Departments of Eleotrioal Engineering and Computer Scienoes, Stanford University. Technioal Note No. 164.
  8. Padawer, Andy (April 1992). "Microsoft P-Code Technology". Microsoft Developer Network . Archived from the original on 2001-02-22.
  9. "Compiling Your Project to Native Code". Visual Studio 6.0 Documentation. 2007. Archived from the original on 2007-02-27.

Further reading