ML/I

Last updated February 22, 2024

ML/1 (Macro Language/One) is a powerful general-purpose macro processor.^[1]

editing, modifying, correcting, or reformatting text files
translating source code from one programming language to another
acting as a source-code preprocessor to allow the user to add new syntactic forms to an existing programming language
supporting program source-code parameterization (e.g. a parameter might determine whether debugging statements are to be included in the program source code that is passed to the compiler)

ML/1 was developed in 1966 by Peter J. Brown as part of PhD research at Cambridge University in England.^[2]

In 1984, Robert D. Eager, one of Peter Brown's colleagues at the University of Kent, rewrote ML/I, first in BCPL in 1981, and later in C in 1984, which increased its portability.

Note that Peter Brown's original name for the language was ML/I, where (as in IBM's PL/I) the last character is the Roman numeral "I", not the Arabic numeral "1". Most subsequent implementations however have been called ML/1 (where the last character is the Arabic numeral "1").

Since then, ML/1 has been ported to many platforms and operating systems, including VMS, MVS, MS-DOS, OS/2, and UNIX. In his implementations of ML/1, Robert D. Eager has added features and capabilities in addition to those originally specified in Peter Brown's thesis.

That version is available for multiple platforms via the ML/1 web site, http://www.ml1.org.uk . The ML/1 web site provides further information about ML/1, as well as documentation (including a tutorial, simple introductory guide, and full user manual).

Although the total number of ML/1 users in the world is small, there are ML/1 users all over the world, and Bob has corresponded with ML/1 users in the United States, Canada, Australia, New Zealand, Germany, Holland, and India.

In a 1976 paper, Andrew S. Tanenbaum describes using ML/I as a compiler-compiler.^[3]

Overview

ML/I accepts input in completely free form, treating data as a stream of bytes rather than a series of lines or records. It does not require any particular flag to indicate a macro expansion, making it particularly useful for processing arbitrary text. Replacements of text can be simple (e.g. PIG is to be replaced by DOG) or complex (e.g. replace the item between the third and fourth commas after the last full stop, by the contents of some counter).

ML/I was used to implement several items of portable software, including itself. It was originally written in a special descriptive language, then mapped into a suitable language for each target system. This mapping was done using ML/I itself. There were two different forms of this descriptive language; high level and low level.

After this mapping ML/I was often used to implement SIL's (system implementation languages, such as C) for the new generation of 16-bit architecture minicomputers.

How ML/1 works

In the most basic terms, here's how ML/1 works.

The user supplies ML/1 with a file containing input text.
In another file (or, optionally, in the same file) the user supplies a set of ML/1 macros. The macros tell the ML/1 interpreter what insertions, deletions, expansions, translations and other modifications the user wants made to the input text.
When ML/1 is run on the input text, ML/1 follows the instructions in the ML/1 macros, changes the text, and writes out a new file containing the modified text.

Distinctive features of ML/1

There are several ways in which ML/1 is more powerful than simple "scan and replace" utilities.

ML/1 does not process text on a character-string by character-string basis; it processes text on a word by word (or, in ML/1's terminology, on an "atom by atom") basis. For many applications, it is extremely useful to be able to process a text as a sequence of atoms rather than a sequence of characters. Suppose, for example, that we wish to translate a program from a programming language that has a DO ... END syntax, into a language that has a BEGIN ... END syntax. We therefore wish to replace "DO" with "BEGIN". If we do the replacement with an ordinary scan-and-replace utility, all occurrences of the string "DO" will be changed to "BEGIN", including any "DO"s that are embedded in words such as "DOCUMENT" (which will become "BEGINCUMENT"). With ML/1, in contrast, this will not happen because the string "DO" will trigger text-replacement only when it occurs as a word (that is, when it is preceded and followed by delimiters such as spaces, tabs, newlines, or punctuation characters).

ML/1, rather than operating on a line-by-line basis, recognizes patterns of text that can be quite complex, nested, with multiple delimiters, and spanning many lines. ML/1 can, for instance, process a pattern such as the common programming language IF ... THEN ... ELSE ... ENDIF structure that spans multiple lines, and contains embedded text that itself may include a nested IF ... THEN ... ELSE ... ENDIF structure.

ML/1 can recognize embedded comments and literal quotations, and protect them from alteration. Ordinary scan-and-replace utilities change strings indiscriminately, whether they occur in the program text as a keyword or variable name, embedded in a comment, or in a quoted literal.

In order to deal with such complicated patterns, ML/1 needs to be a programming language in its own right. Like other programming languages, ML/1 supports variables and assignment statements, GOTOs and labels, IF... THEN tests and loops. These features give ML/1 an unusual degree of power and flexibility.

Limitations

ML/1 is case-sensitive, so it does not support case-insensitive text processing.

Related Research Articles

<span class="mw-page-title-main">AWK</span> Programming language

AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.

A computer program is a sequence or set of instructions in a programming language for a computer to execute. It is one component of software, which also includes documentation and other intangible components.

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ANSI INCITS 226-1994 (S2018). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived from the ANSI Common Lisp standard.

Forth is a procedural, concatenative, stack-oriented programming language and interactive development environment designed by Charles H. "Chuck" Moore and first used by other programmers in 1970. Although not an acronym, the language's name in its early years was often spelled in all capital letters as FORTH. The FORTH-79 and FORTH-83 implementations, which were not written by Moore, became de facto standards, and an official standardization of the language was published in 1994 as ANS Forth. A wide range of Forth derivatives existed before and after ANS Forth. The free software Gforth implementation is actively maintained, as are several commercially supported systems.

LaTeX is a software system for typesetting documents. LaTeX markup describes the content and layout of the document, as opposed to the formatted text found in WYSIWYG word processors like Microsoft Word, LibreOffice Writer and Apple Pages. The writer uses markup tagging conventions to define the general structure of a document, to stylise text throughout a document, and to add citations and cross-references. A TeX distribution such as TeX Live or MiKTeX is used to produce an output file suitable for printing or digital distribution.

In computer programming, a macro is a rule or pattern that specifies how a certain input should be mapped to a replacement output. Applying a macro to an input is known as macro expansion. The input and output may be a sequence of lexical tokens or characters, or a syntax tree. Character macros are supported in software applications to make it easy to invoke common command sequences. Token and tree macros are supported in some programming languages to enable code reuse or to extend the language, sometimes for domain-specific languages.

sed is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed was based on the scripting features of the interactive editor ed and the earlier qed. It was one of the earliest tools to support regular expressions, and remains in use for text processing, most notably with the substitution command. Popular alternative tools for plaintext string manipulation and "stream editing" include AWK and Perl.

In computer science, a preprocessor is a program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.

Lex is a computer program that generates lexical analyzers. It is commonly used with the yacc parser generator and is the standard lexical analyzer generator on many Unix and Unix-like systems. An equivalent tool is specified as part of the POSIX standard.

In software development, Make is a build automation tool that builds executable programs and libraries from source code by reading files called makefiles which specify how to derive the target program. Though integrated development environments and language-specific compiler features can also be used to manage a build process, Make remains widely used, especially in Unix and Unix-like operating systems.

The C preprocessor is the macro preprocessor for several computer programming languages, such as C, Objective-C, C++, and a variety of Fortran languages. The preprocessor provides inclusion of header files, macro expansions, conditional compilation, and line control.

The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard. Starting from the original ANSI C standard, it was developed at the same time as the C library POSIX specification, which is a superset of it. Since ANSI C was adopted by the International Organization for Standardization, the C standard library is also called the ISO C library.

TRACLanguage is a programming language developed between 1959–1964 by Calvin Mooers and first implemented on the PDP-1 in 1964 by L. Peter Deutsch. It was one of three "first languages" recommended by Ted Nelson in Computer Lib. TRAC T64 was used until at least 1984, when Mooers updated it to TRAC T84.

m4 is a general-purpose macro processor included in most Unix-like operating systems, and is a component of the POSIX standard.

High Level Assembly (HLA) is a language developed by Randall Hyde that allows the use of higher-level language constructs to aid both beginners and advanced assembly developers. It fully supports advanced data types and object-oriented programming. It uses a syntax loosely based on several high-level programming languages (HLLs), such as Pascal, Ada, Modula-2, and C++, to allow the creation of readable assembly language programs, and to allow HLL programmers to learn HLA as fast as possible.

ORVYL is a time-sharing monitor developed by Stanford University for IBM System/360 and System/370 computers in 1967–68. ORVYL was one of the first time-sharing systems to be made available for IBM computers. Wylbur is a text editor and word processor program designed to work either without ORVYL, or in conjunction with ORVYL.

<span class="mw-page-title-main">Scripting language</span> Programming language for run-time events

A scripting language or script language is a programming language that is used to manipulate, customize, and automate the facilities of an existing system. Scripting languages are usually interpreted at runtime rather than compiled.

The PL/I preprocessor is the preprocessor for the PL/I computer programming language. The preprocessor interprets a subset of the full PL/I language to perform source file inclusion, conditional compilation, and macro expansion.

TTM is a string oriented, general purpose macro processing programming language developed in 1968 by Steven Caine and E. Kent Gordon at the California Institute of Technology.

References

↑ A. J. Cole (26 November 1981). Macro Processors. CUP Archive. p. 85. ISBN 978-0-521-28560-5.
↑ Brown, P. J. (1967). "The ML/I macro processor". Communications of the ACM. 10 (10): 618–623. doi: 10.1145/363717.363746 . ISSN 0001-0782.
↑ Tanenbaum, A.S. (1976). "A General-Purpose Macro Processor as a Poor Man's Compiler-Compiler". IEEE Transactions on Software Engineering. SE-2 (2): 121–125. doi:10.1109/TSE.1976.233539. ISSN 0098-5589. S2CID 16317510.

External links

http://www.ml1.org.uk

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Cole1981-1] A. J. Cole (26 November 1981). Macro Processors. CUP Archive. p. 85. ISBN 978-0-521-28560-5.

[Brown1967-2] Brown, P. J. (1967). "The ML/I macro processor". Communications of the ACM. 10 (10): 618–623. doi: 10.1145/363717.363746 . ISSN 0001-0782.

[Tanenbaum1976-3] Tanenbaum, A.S. (1976). "A General-Purpose Macro Processor as a Poor Man's Compiler-Compiler". IEEE Transactions on Software Engineering. SE-2 (2): 121–125. doi:10.1109/TSE.1976.233539. ISSN 0098-5589. S2CID 16317510.

[1]

[2]

[3]