M4 (computer language)

Last updated
m4
Paradigm macro
Designed by Brian Kernighan, Dennis Ritchie
First appeared1977;47 years ago (1977)
Major implementations
GNU m4

m4 is a general-purpose macro processor included in most Unix-like operating systems, and is a component of the POSIX standard.

Contents

The language was designed by Brian Kernighan and Dennis Ritchie for the original versions of UNIX. It is an extension of an earlier macro processor, m3, written by Ritchie for an unknown AP-3 minicomputer. [1]

The macro preprocessor operates as a text-replacement tool. It is employed to re-use text templates, typically in computer programming applications, but also in text editing and text-processing applications. Most users require m4 as a dependency of GNU autoconf.

History

Macro processors became popular when programmers commonly used assembly language. In those early days of programming, programmers noted that much of their programs consisted of repeated text, and they invented simple means for reusing this text. Programmers soon discovered the advantages not only of reusing entire blocks of text, but also of substituting different values for similar parameters. This defined the usage range of macro processors at the time. [2]

In the 1960s, an early general-purpose macro processor, M6, was in use at AT&T Bell Laboratories, which was developed by Douglas McIlroy, Robert Morris and Andrew Hall. [3]

Kernighan and Ritchie developed m4 in 1977, basing it on the ideas of Christopher Strachey. The distinguishing features of this style of macro preprocessing included:

The implementation of Rational Fortran used m4 as its macro engine from the beginning, and most Unix variants ship with it.

As of 2024 many applications continue to use m4 as part of the GNU Project's autoconf. It also appears in the configuration process of sendmail (a widespread[ citation needed ] mail transfer agent) and for generating footprints in the gEDA toolsuite. The SELinux Reference Policy relies heavily on the m4 macro processor.

m4 has many uses in code generation, but (as with any macro processor) problems can be hard to debug. [4]

Features

m4 offers these facilities:

Unlike most earlier macro processors, m4 does not target any particular computer or human language; historically, however, its development originated for supporting the Ratfor dialect of Fortran. Unlike some other macro processors, m4 is Turing-complete as well as a practical programming language.

Unquoted identifiers which match defined macros are replaced with their definitions. Placing identifiers in quotes suppresses expansion until possibly later, such as when a quoted string is expanded as part of macro replacement. Unlike most languages, strings in m4 are quoted using the backtick (`) as the starting delimiter, and apostrophe (') as the ending delimiter. Separate starting and ending delimiters allows the arbitrary nesting of quotation marks in strings to be used, allowing a fine degree of control of how and when macro expansion takes place in different parts of a string.

Example

The following fragment gives a simple example that could form part of a library for generating HTML code. It defines a commented macro to number sections automatically:

divert(-1)  m4 has multiple output queues that can be manipulated with the `divert' macro. Valid queues range from 0 to 10, inclusive, with the default queue being 0. As an extension, GNU m4 supports more diversions, limited only by integer type size.  Calling the `divert' macro with an invalid queue causes text to be discarded until another call.  Note that even while output is being discarded, quotes around `divert' and other macros are needed to prevent expansion.  # Macros aren't expanded within comments, meaning that keywords such # as divert and other built-ins may be used without consequence.  # HTML utility macro:  define(`H2_COUNT', 0)  # The H2_COUNT macro is redefined every time the H2 macro is used:  define(`H2',  `define(`H2_COUNT', incr(H2_COUNT))<h2>H2_COUNT. $1</h2>')  divert(1)dnl dnl dnl The dnl macro causes m4 to discard the rest of the line, thus dnl preventing unwanted blank lines from appearing in the output. dnl H2(First Section) H2(Second Section) H2(Conclusion) dnl divert(0)dnl dnl <HTML> undivert(1)dnl One of the queues is being pushed to output. </HTML>

Processing this code with m4 generates the following text:

<HTML><h2>1. First Section</h2><h2>2. Second Section</h2><h2>3. Conclusion</h2></HTML>

Implementations

FreeBSD, NetBSD, and OpenBSD provide independent implementations of the m4 language. Furthermore, the Heirloom Project Development Tools includes a free version of the m4 language, derived from OpenSolaris.

M4 has been included in the Inferno operating system. This implementation is more closely related to the original m4 developed by Kernighan and Ritchie in Version 7 Unix than its more sophisticated relatives in UNIX System V and POSIX. [5]

GNU m4 is an implementation of m4 for the GNU Project. [6] [7] It is designed to avoid many kinds of arbitrary limits found in traditional m4 implementations, such as maximum line lengths, maximum size of a macro and number of macros. Removing such arbitrary limits is one of the stated goals of the GNU Project. [8]

The GNU Autoconf package makes extensive use of the features of GNU m4.

GNU m4 is currently maintained by Gary V. Vaughan and Eric Blake. [6] GNU m4 is free software, released under the terms of the GNU General Public License.

See also

Related Research Articles

<span class="mw-page-title-main">AWK</span> Programming language

AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and it is a standard feature of most Unix-like operating systems.

C is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems code, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

<span class="mw-page-title-main">Macro (computer science)</span> Rule for substituting a set input with a set output

In computer programming, a macro is a rule or pattern that specifies how a certain input should be mapped to a replacement output. Applying a macro to an input is known as macro expansion.

troff, short for "typesetter roff", is the major component of a document processing system developed by Bell Labs for the Unix operating system. troff and the related nroff were both developed from the original roff.

Yacc is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a lookahead left-to-right rightmost derivation (LALR) parser generator, generating a LALR parser based on a formal grammar, written in a notation similar to Backus–Naur form (BNF). Yacc is supplied as a standard utility on BSD and AT&T Unix. GNU-based Linux distributions include Bison, a forward-compatible Yacc replacement.

In computer science, a preprocessor is a program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.

man page Unix software documentation

A man page is a form of software documentation found on Unix and Unix-like operating systems. Topics covered include programs, system libraries, system calls, and sometimes local system details. The local host administrators can create and install manual pages associated with the specific host. A manual end user may invoke a documentation page by issuing the man command followed by the specific detail they require. These manual pages are typically requested by end users, programmers and administrators doing real time work but can also be formatted for printing.

GNU Autoconf is a software development tool for generating a configure script that in turn generates files for building a codebase and for packaging or installing the resulting files. Autoconf is part of the GNU Build System – along with Automake, Libtool, Autoheader and other tools.

In software development, Make is a command-line interface software tool that performs actions ordered by configured dependencies as defined in a configuration file called a makefile. It is commonly used for build automation to build executable code from source code. But, not limited to building, Make can perform any operation available via the operating system shell.

The C preprocessor is the macro preprocessor for several computer programming languages, such as C, Objective-C, C++, and a variety of Fortran languages. The preprocessor provides inclusion of header files, macro expansions, conditional compilation, and line control.

In computer programming, indentation style is a convention, a.k.a. style, governing the indentation of blocks of source code. An indentation style generally involves consistent width of whitespace before each line of a block, so that the lines of code appear to be related, and dictates whether to use space or tab characters for the indentation whitespace.

The C standard library, sometimes referred to as libc, is the standard library for the C programming language, as specified in the ISO C standard. Starting from the original ANSI C standard, it was developed at the same time as the C library POSIX specification, which is a superset of it. Since ANSI C was adopted by the International Organization for Standardization, the C standard library is also called the ISO C library.

Ratfor is a programming language implemented as a preprocessor for Fortran 66. It provides modern control structures, unavailable in Fortran 66, to replace GOTOs and statement numbers.

In computer programming, glob patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command mv *.txttextfiles/ moves all files with names ending in .txt from the current directory to the directory textfiles. Here, * is a wildcard and *.txt is a glob pattern. The wildcard * stands for "any string of any length including empty, but excluding the path separator characters ".

Part of the troff suite of Unix document layout tools, eqn is a preprocessor that formats equations for printing. A similar program, neqn, accepted the same input as eqn, but produced output tuned to look better in nroff. The eqn program was created in 1974 by Brian Kernighan and Lorinda Cherry. It was implemented using yacc compiler-compiler.

<i>The Unix Programming Environment</i> Book by Brian Kernighan

The Unix Programming Environment, first published in 1984 by Prentice Hall, is a book written by Brian W. Kernighan and Rob Pike, both of Bell Labs and considered an important and early document of the Unix operating system.

<span class="mw-page-title-main">History of Unix</span> Operating system

The history of Unix dates back to the mid-1960s, when the Massachusetts Institute of Technology, Bell Labs, and General Electric were jointly developing an experimental time-sharing operating system called Multics for the GE-645 mainframe. Multics introduced many innovations, but also had many problems. Bell Labs, frustrated by the size and complexity of Multics but not its aims, slowly pulled out of the project. Their last researchers to leave Multics – among them Ken Thompson, Dennis Ritchie, Doug McIlroy, and Joe Ossanna – decided to redo the work, but on a much smaller scale.

assert.h is a header file in the C standard library. It defines the C preprocessor macro assert and implements runtime assertion in C.

In computing, Pic is a domain-specific programming language by Brian Kernighan for specifying line diagrams. The language contains predefined basic linear objects: line, move, arrow, and spline, the planar objects box, circle, ellipse, arc, and definable composite elements. Objects are placed with respect to other objects or absolute coordinates. A liberal interpretation of the input invokes default parameters when objects are incompletely specified. An interpreter translates this description into concrete drawing commands in a variety of possible output formats. Pic is a procedural programming language, with variable assignment, macros, conditionals, and looping. The language is an example of a little language originally intended for the comfort of non-programmers in the Unix environment.

<span class="mw-page-title-main">Unix</span> Family of computer operating systems

Unix is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others. Initially intended for use inside the Bell System, AT&T licensed Unix to outside parties in the late 1970s, leading to a variety of both academic and commercial Unix variants from vendors including University of California, Berkeley (BSD), Microsoft (Xenix), Sun Microsystems (SunOS/Solaris), HP/HPE (HP-UX), and IBM (AIX).

References

  1. Brian W. Kernighan and Dennis M. Ritchie. The m4 macro processor. Technical report, Bell Laboratories, Murray Hill, New Jersey, USA, 1977. pdf
  2. History of GNU m4
  3. Hall, Andrew D. (1972). The M6 Macro Processor. Computing Science Technical Report #2 (PDF) (Report). Bell Labs.
  4. Kenneth J. Turner. Exploiting the m4 macro language. Technical Report CSM-126, Department of Computing Science and Mathematics, University of Stirling, Scotland, September 1994. pdf
  5. m4(1)    Inferno General commands Manual
  6. 1 2 GNU m4 web site "GNU M4", accessed January 25, 2020.
  7. GNU m4 manual, online and for download in HTML, PDF, and other forms. "GNU M4 GNU macro processor", accessed January 25, 2020.
  8. "GNU Coding Standards: Writing Robust Programs". quote: "Avoid arbitrary limits on the length or number of any data structure".