XPL, for expert's programming language [1] is a programming language based on PL/I, a portable one-pass compiler written in its own language, and a parser generator tool for easily implementing similar compilers for other languages. XPL was designed in 1967 as a way to teach compiler design principles and as starting point for students to build compilers for their own languages.
XPL was designed and implemented by William M. McKeeman, [2] [3] David B. Wortman, James J. Horning and others at Stanford University. XPL was first announced at the 1968 Fall Joint Computer Conference. The methods and compiler are described in detail in the 1971 textbook A Compiler Generator.
They called the combined work a 'compiler generator'. But that implies little or no language- or target-specific programming is required to build a compiler for a new language or new target. A better label for XPL is a translator writing system. It helps to write a compiler with less new or changed programming code.
The XPL language is a simple, small, efficient dialect of PL/I intended mainly for the task of writing compilers. The XPL language was also used for other purposes once it was available. XPL can be compiled easily to most modern machines by a simple compiler. Compiler internals can be written easily in XPL, and the code is easy to read. The PL/I language was designed by an IBM committee in 1964 as a comprehensive language replacing Fortran, COBOL, and ALGOL, and meeting all customer and internal needs. These ambitious goals made PL/I complex, hard to implement efficiently, and sometimes surprising when used. XPL is a small dialect of the full language. XPL has one added feature not found in PL/I: a STRING datatype with dynamic lengths. String values live in a separate text-only heap memory space with automatic garbage collection of stale values. Much of what a simple compiler does is manipulating input text and output byte streams, so this feature helps simplify XPL-based compilers.
The XPL compiler, called XCOM, is a one-pass compiler using a table-driven parser and simple code generation techniques. Versions of XCOM exist for different machine architectures, using different hand-written code generation modules for those targets. The original target was IBM System/360, which is a proper subset of IBM System/370, IBM System/390 and IBM System z.
XCOM compiles from XPL source code, but since XCOM itself is written in XPL it can compile itself – it is a self-compiling compiler, not reliant on other compilers. Several famous languages have self-compiling compilers, including Burroughs B5000 Algol, PL/I, C, LISP, and Java. Creating such compilers is a chicken-and-egg conundrum. The language is first implemented by a temporary compiler written in some other language, or even by an interpreter (often an interpreter for an intermediate code, as BCPL can do with intcode or O-code).
XCOM began as an Algol program running on Burroughs machines, translating XPL source code into System/360 machine code. The XPL team manually turned its Algol source code into XPL source code. That XPL version of XCOM was then compiled on Burroughs, creating a self-compiling XCOM for System/360 machines. The Algol version was then thrown away, and all further improvements happened in the XPL version only. This is called bootstrapping the compiler. The authors of XPL invented the tombstone diagram or T-diagram to document the bootstrapping process.
Retargeting the compiler for a new machine architecture is a similar exercise, except only the code generation modules need to be changed.
XCOM is a one-pass compiler (but with an emitted code fix-up process for forward branches, loops and other defined situations). It emits machine code for each statement as each grammar rule within a statement is recognized, rather than waiting until it has parsed the entire procedure or entire program. There are no parse trees or other required intermediate program forms, and no loop-wide or procedure-wide optimizations. XCOM does, however, perform peephole optimization. The code generation response to each grammar rule is attached to that rule. This immediate approach can result in inefficient code and inefficient use of machine registers. Such are offset by the efficiency of implementation, namely, the use of dynamic strings mentioned earlier: in processing text during compilation, substring operations are frequently performed. These are as fast as an assignment to an integer; the actual substring is not moved. In short, it is quick, easy to teach in a short course, fits into modest-sized memories, and is easy to change for different languages or different target machines.
The XCOM compiler has a hand-written lexical scanner and a mechanically-generated parser. The syntax of the compiler's input language (in this case, XPL) is described by a simplified BNF grammar. XPL's grammar analyzer tool ANALYZER or XA turns this into a set of large data tables describing all legal combinations of the syntax rules and how to discern them. This table generation step is re-done only when the language is changed. When the compiler runs, those data tables are used by a small, language-independent parsing algorithm to parse and respond to the input language. This style of table-driven parser is generally easier to write than an entirely hand-written recursive descent parser. XCOM uses a bottom-up parsing method, in which the compiler can delay its decision about which syntax rule it has encountered until it has seen the rightmost end of that phrase. This handles a wider range of programming languages than top-down methods, in which the compiler must guess or commit to a specific syntax rule early, when it has only seen the left end of a phrase.
XPL includes a minimal runtime support library for allocating and garbage-collecting XPL string values. The source code for this library must be included into most every program written in XPL.
The last piece of the XPL compiler writing system is an example compiler named SKELETON. This is just XCOM with parse tables for an example toy grammar instead of XPL's full grammar. It is a starting point for building a compiler for some new language, if that language differs much from XPL.
XPL is run under the control of a monitor, XMON, which is the only operating system-specific part of this system, and which acts as a "loader" for XCOM itself or any programs which were developed using XCOM, and also provides three auxiliary storage devices for XCOM's use, and which are directly accessed by block number. The originally published XMON was optimized for IBM 2311s. An XMON parameter FILE= enabled the monitor to efficiently use other disks with larger block sizes. [4] The working disk block size was also a compile-time constant in XCOM. [5]
XMON used a very simple strategy for disk direct access. NOTE provided the address of a disk track. POINT set the location of the next disk track to be the address previously returned by NOTE. This strategy was adopted to allow easy porting of XMON to other OSes, and to avoid the much more complicated disk direct access options available at that time. [6]
Converting XMON from its primitive use of NOTE, POINT and READ/WRITE disk operations—with precisely 1 block per track—to EXCP (i.e., write/create new records) and XDAP (i.e., read/update old records)—with n blocks per track, where n was computed at run-time from the target device's physical characteristics and could be significantly greater than 1—achieved significantly improved application performance and decreased operating system overhead.
Although originally developed for OS/360, XMON (either the original NOTE, POINT and READ/WRITE implementation; or the EXCP and XDAP enhancement) will run on subsequently released IBM OSes, including OS/370, XA, OS/390 and z/OS, generally without any changes.
XCOM originally used a now-obsolete bottom-up parse table method called Mixed Strategy Precedence , invented by the XPL team (although the officially released version retains the MSP parser and does not include later-released "peephole optimizations" and additional data types which were developed outside of the original implementation team.) MSP is a generalization of the simple precedence parser method invented by Niklaus Wirth for PL360. Simple precedence is itself a generalization of the trivially simple operator precedence methods that work nicely for expressions like A+B*(C+D)-E. MSP tables include a list of expected triplets of language symbols. This list grows larger as the cube of the grammar size, and becomes quite large for typical full programming languages. XPL-derived compilers were difficult to fit onto minicomputers of the 1970s with limited memories. [nb 1] MSP is also not powerful enough to handle all likely grammars. It is applicable only when the language designer can tweak the language definition to fit MSP's restrictions, before the language is widely used.
The University of Toronto subsequently changed XCOM and XA to instead use a variant of Donald Knuth's LR parser bottom-up method. [nb 2] XCOM's variant is called Simple LR or SLR. It handles more grammars than MSP but not quite as many grammars as LALR or full LR(1). The differences from LR(1) are mostly in the table generator's algorithms, not in the compile-time parser method. XCOM and XA predate the widespread availability of Unix and its yacc parser generator tool. XA and yacc have similar purposes.
XPL is open source. The System/360 version of XPL was distributed through the IBM SHARE users organization. Other groups ported XPL onto many of the larger machines of the 1970s. Various groups extended XPL, or used XPL to implement other moderate-sized languages.
XPL has been used to develop a number of compilers for various languages and systems.
XPL continues to be ported to current computers. An x86/FreeBSD port was done in 2000, [8] an x86/Linux port in 2015, and an XPL to C translator in 2017. [9] [10]
ALGOL is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by the Association for Computing Machinery (ACM) in textbooks and academic sources for more than thirty years.
In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.
In computer science, an LALR parser is part of the compiling process where human readable text is converted into a structured representation to be read by computers. An LALR parser is a software tool to process (parse) text into a very specific internal representation that other programs, such as compilers, can work with. This process happens according to a set of production rules specified by a formal grammar for a computer language.
PL/I is a procedural, imperative computer programming language initially developed by IBM. It is designed for scientific, engineering, business and system programming. It has been in continuous use by academic, commercial and industrial organizations since it was introduced in the 1960s.
In computer science, Backus–Naur form is a notation used to describe the syntax of programming languages or other formal languages. It was developed by John Backus and Peter Naur. BNF can be described as a metasyntax notation for context-free grammars. Backus–Naur form is applied wherever exact descriptions of languages are needed, such as in official language specifications, in manuals, and in textbooks on programming language theory. BNF can be used to describe document formats, instruction sets, and communication protocols.
In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.
Lexical tokenization is conversion of a text into meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is related to the type of tokenization used in large language models (LLMs) but with two differences. First, lexical tokenization is usually based on a lexical grammar, whereas LLM tokenizers are usually probability-based. Second, LLM tokenizers perform a second step that converts the tokens into numerical values.
PL/C is an instructional dialect of the programming language PL/I, developed at the Department of Computer Science of Cornell University in the early 1970s in an effort headed by Professor Richard W. Conway and graduate student Thomas R. Wilcox. PL/C was developed with the specific goal of being used for teaching programming. The PL/C compiler, which implemented almost all of the large PL/I language, had the unusual capability of never failing to compile a program, through the use of extensive automatic correction of many syntax errors and by converting any remaining syntax errors to output statements. This was important because, at the time, students submitted their programs on IBM punch cards and might not get their output back for several hours. Over 250 other universities adopted PL/C; as one late-1970s textbook on PL/I noted, "PL/C ... the compiler for PL/I developed at Cornell University ... is widely used in teaching programming." Similarly, a mid-late-1970s survey of programming languages said that "PL/C is a widely used dialect of PL/I."
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.
The Syntax/Semantic Language (S/SL) is an executable high level specification language for recursive descent parsers, semantic analyzers and code generators developed by James Cordy, Ric Holt and David Wortman at the University of Toronto in 1980.
The PL/M programming language (an acronym of Programming Language for Microcomputers) is a high-level language conceived and developed by Gary Kildall in 1973 for Hank Smith at Intel for its microprocessors.
The dangling else is a problem in programming of parser generators in which an optional else clause in an if–then(–else) statement results in nested conditionals being ambiguous. Formally, the reference context-free grammar of the language is ambiguous, meaning there is more than one correct parse tree.
ALGOL 60 is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin
and end
pairs for delimiting them, representing a key advance in the rise of structured programming. ALGOL 60 was one of the first languages implementing function definitions. ALGOL 60 function definitions could be nested within one another, with lexical scope. It gave rise to many other languages, including CPL, PL/I, Simula, BCPL, B, Pascal, and C. Practically every computer of the era had a systems programming language based on ALGOL 60 concepts.
PL/S, short for Programming Language/Systems, is a "machine-oriented" programming language based on PL/I. It was developed by IBM in the late 1960s, under the name Basic Systems Language (BSL), as a replacement for assembly language on internal software projects; it included support for inline assembly and explicit control over register usage.
META II is a domain-specific programming language for writing compilers. It was created in 1963–1964 by Dewey Val Schorre at UCLA. META II uses what Schorre called syntax equations. Its operation is simply explained as:
Each syntax equation is translated into a recursive subroutine which tests the input string for a particular phrase structure, and deletes it if found.
ALGOL 68RS is the second ALGOL 68 compiler written by I. F. Currie and J. D. Morrison, at the Royal Signals and Radar Establishment (RSRE). Unlike the earlier ALGOL 68-R, it was designed to be portable, and implemented the language of the Revised Report.
PL360 is a system programming language designed by Niklaus Wirth and written by Wirth, Joseph W. Wells Jr., and Edwin Satterthwaite Jr. for the IBM System/360 computer at Stanford University. A description of PL360 was published in early 1968, although the implementation was probably completed before Wirth left Stanford in 1967.
In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.
A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods for computer languages and other notations formally defined by a grammar. The parsing methods most commonly used for parsing programming languages, LR parsing and its variations, are shift-reduce methods. The precedence parsers used before the invention of LR parsing are also shift-reduce methods. All shift-reduce parsers have similar outward effects, in the incremental order in which they build a parse tree or call specific output actions.