Macro (computer science)

Last updated
jEdit's macro editor Jedit macro recorder.png
jEdit's macro editor

A macro (short for "macroinstruction", from Greek μακρός 'long') in computer science is a rule or pattern that specifies how a certain input sequence (often a sequence of characters) should be mapped to a replacement output sequence (also often a sequence of characters) according to a defined procedure. The mapping process that instantiates (transforms) a macro use into a specific sequence is known as macro expansion. A facility for writing macros may be provided as part of a software application or as a part of a programming language. In the former case, macros are used to make tasks using the application less repetitive. In the latter case, they are a tool that allows a programmer to enable code reuse or even to design domain-specific languages.

Contents

Macros are used to make a sequence of computing instructions available to the programmer as a single program statement, making the programming task less tedious and less error-prone. [1] [2] (Thus, they are called "macros" because a "big" block of code can be expanded from a "small" sequence of characters.) Macros often allow positional or keyword parameters that dictate what the conditional assembler program generates and have been used to create entire programs or program suites according to such variables as operating system, platform or other factors. The term derives from "macro instruction", and such expansions were originally used in generating assembly language code.

Keyboard and mouse macros

Keyboard macros and mouse macros allow short sequences of keystrokes and mouse actions to transform into other, usually more time-consuming, sequences of keystrokes and mouse actions. In this way, frequently used or repetitive sequences of keystrokes and mouse movements can be automated. Separate programs for creating these macros are called macro recorders.

During the 1980s, macro programs – originally SmartKey, then SuperKey, KeyWorks, Prokey – were very popular, first as a means to automatically format screenplays, then for a variety of user input tasks. These programs were based on the TSR (terminate and stay resident) mode of operation and applied to all keyboard input, no matter in which context it occurred. They have to some extent fallen into obsolescence following the advent of mouse-driven user interfaces and the availability of keyboard and mouse macros in applications such as word processors and spreadsheets, making it possible to create application-sensitive keyboard macros.

Keyboard macros have in more recent times come to life as a method of exploiting the economy of massively multiplayer online role-playing games (MMORPGs). By tirelessly performing a boring, repetitive, but low risk action, a player running a macro can earn a large amount of the game's currency or resources. This effect is even larger when a macro-using player operates multiple accounts simultaneously, or operates the accounts for a large amount of time each day. As this money is generated without human intervention, it can dramatically upset the economy of the game. For this reason, use of macros is a violation of the TOS or EULA of most MMORPGs, and administrators of MMORPGs fight a continual war to identify and punish macro users. [3]

Application macros and scripting

Keyboard and mouse macros that are created using an application's built-in macro features are sometimes called application macros. They are created by carrying out the sequence once and letting the application record the actions. An underlying macro programming language, most commonly a scripting language, with direct access to the features of the application may also exist.

The programmers' text editor, Emacs, (short for "editing macros") follows this idea to a conclusion. In effect, most of the editor is made of macros. Emacs was originally devised as a set of macros in the editing language TECO; it was later ported to dialects of Lisp.

Another programmers' text editor, Vim (a descendant of vi), also has full implementation of macros. It can record into a register (macro) what a person types on the keyboard and it can be replayed or edited just like VBA macros for Microsoft Office. Vim also has a scripting language called Vimscript [4] to create macros.

Visual Basic for Applications (VBA) is a programming language included in Microsoft Office from Office 97 through Office 2019 (although it was available in some components of Office prior to Office 97). However, its function has evolved from and replaced the macro languages that were originally included in some of these applications.

Macro virus

VBA has access to most Microsoft Windows system calls and executes when documents are opened. This makes it relatively easy to write computer viruses in VBA, commonly known as macro viruses. In the mid-to-late 1990s, this became one of the most common types of computer virus. However, during the late 1990s and to date, Microsoft has been patching and updating their programs. In addition, current anti-virus programs immediately counteract such attacks.

Parameterized macro

A parameterized macro is a macro that is able to insert given objects into its expansion. This gives the macro some of the power of a function.

As a simple example, in the C programming language, this is a typical macro that is not a parameterized macro:

#define PI   3.14159

This causes the string "PI" to be replaced with "3.14159" wherever it occurs. It will always be replaced by this string, and the resulting string cannot be modified in any way. An example of a parameterized macro, on the other hand, is this:

#define pred(x)  ((x)-1)

What this macro expands to depends on what argument x is passed to it. Here are some possible expansions:

 pred(2)    →  ((2)   -1)  pred(y+2)  →  ((y+2) -1)  pred(f(5)) →  ((f(5))-1)

Parameterized macros are a useful source-level mechanism for performing in-line expansion, but in languages such as C where they use simple textual substitution, they have a number of severe disadvantages over other mechanisms for performing in-line expansion, such as inline functions.

The parameterized macros used in languages such as Lisp, PL/I and Scheme, on the other hand, are much more powerful, able to make decisions about what code to produce based on their arguments; thus, they can effectively be used to perform run-time code generation.

Text-substitution macros

Languages such as C and some assembly languages have rudimentary macro systems, implemented as preprocessors to the compiler or assembler. C preprocessor macros work by simple textual substitution at the token, rather than the character level. However, the macro facilities of more sophisticated assemblers, e.g., IBM High Level Assembler (HLASM) can't be implemented with a preprocessor; the code for assembling instructions and data is interspersed with the code for assembling macro invocations.

A classic use of macros is in the computer typesetting system TeX and its derivatives, where most of the functionality is based on macros.

MacroML is an experimental system that seeks to reconcile static typing and macro systems. Nemerle has typed syntax macros, and one productive way to think of these syntax macros is as a multi-stage computation.

Other examples:

Some major applications have been written as text macro invoked by other applications, e.g., by XEDIT in CMS.

Embeddable languages

Some languages, such as PHP, can be embedded in free-format text, or the source code of other languages. The mechanism by which the code fragments are recognised (for instance, being bracketed by <?php and ?>) is similar to a textual macro language, but they are much more powerful, fully featured languages.

Procedural macros

Macros in the PL/I language are written in a subset of PL/I itself: the compiler executes "preprocessor statements" at compilation time, and the output of this execution forms part of the code that is compiled. The ability to use a familiar procedural language as the macro language gives power much greater than that of text substitution macros, at the expense of a larger and slower compiler.

Frame technology's frame macros have their own command syntax but can also contain text in any language. Each frame is both a generic component in a hierarchy of nested subassemblies, and a procedure for integrating itself with its subassembly frames (a recursive process that resolves integration conflicts in favor of higher level subassemblies). The outputs are custom documents, typically compilable source modules. Frame technology can avoid the proliferation of similar but subtly different components, an issue that has plagued software development since the invention of macros and subroutines.

Most assembly languages have less powerful procedural macro facilities, for example allowing a block of code to be repeated N times for loop unrolling; but these have a completely different syntax from the actual assembly language.

Syntactic macros

Macro systems—such as the C preprocessor described earlier—that work at the level of lexical tokens cannot preserve the lexical structure reliably. Syntactic macro systems work instead at the level of abstract syntax trees, and preserve the lexical structure of the original program. The most widely used implementations of syntactic macro systems are found in Lisp-like languages. These languages are especially suited for this style of macro due to their uniform, parenthesized syntax (known as S-expressions). In particular, uniform syntax makes it easier to determine the invocations of macros. Lisp macros transform the program structure itself, with the full language available to express such transformations. While syntactic macros are often found in Lisp-like languages, they are also available in other languages such as Prolog, Dylan, Scala, Nemerle, Rust, Elixir, Nim, Haxe, [5] , and Julia. They are also available as third-party extensions to JavaScript, [6] C# and Python. [7] . [8]

Early Lisp macros

Before Lisp had macros, it had so-called FEXPRs, function-like operators whose inputs were not the values computed by the arguments but rather the syntactic forms of the arguments, and whose output were values to be used in the computation. In other words, FEXPRs were implemented at the same level as EVAL, and provided a window into the meta-evaluation layer. This was generally found to be a difficult model to reason about effectively. [9]

In 1963, Timothy Hart proposed adding macros to Lisp 1.5 in AI Memo 57: MACRO Definitions for LISP. [10]

Anaphoric macros

An anaphoric macro is a type of programming macro that deliberately captures some form supplied to the macro which may be referred to by an anaphor (an expression referring to another). Anaphoric macros first appeared in Paul Graham's On Lisp and their name is a reference to linguistic anaphora—the use of words as a substitute for preceding words.

Hygienic macros

In the mid-eighties, a number of papers [11] [12] introduced the notion of hygienic macro expansion (syntax-rules), a pattern-based system where the syntactic environments of the macro definition and the macro use are distinct, allowing macro definers and users not to worry about inadvertent variable capture (cf. referential transparency). Hygienic macros have been standardized for Scheme in the R5RS, R6RS, and R7RS standards. A number of competing implementations of hygienic macros exist such as syntax-rules, syntax-case, explicit renaming, and syntactic closures. Both syntax-rules and syntax-case have been standardized in the Scheme standards.

Recently, Racket has combined the notions of hygienic macros with a "tower of evaluators", so that the syntactic expansion time of one macro system is the ordinary runtime of another block of code, [13] and showed how to apply interleaved expansion and parsing in a non-parenthesized language. [14]

A number of languages other than Scheme either implement hygienic macros or implement partially hygienic systems. Examples include Scala, Rust, Elixir, Julia, Dylan, and Nemerle.

Applications

Evaluation order
Macro systems have a range of uses. Being able to choose the order of evaluation (see lazy evaluation and non-strict functions) enables the creation of new syntactic constructs (e.g. control structures) indistinguishable from those built into the language. For instance, in a Lisp dialect that has cond but lacks if, it is possible to define the latter in terms of the former using macros. For example, Scheme has both continuations and hygienic macros, which enables a programmer to design their own control abstractions, such as looping and early exit constructs, without the need to build them into the language.
Data sub-languages and domain-specific languages
Next, macros make it possible to define data languages that are immediately compiled into code, which means that constructs such as state machines can be implemented in a way that is both natural and efficient. [15]
Binding constructs
Macros can also be used to introduce new binding constructs. The most well-known example is the transformation of let into the application of a function to a set of arguments.

Felleisen conjectures [16] that these three categories make up the primary legitimate uses of macros in such a system. Others have proposed alternative uses of macros, such as anaphoric macros in macro systems that are unhygienic or allow selective unhygienic transformation.

The interaction of macros and other language features has been a productive area of research. For example, components and modules are useful for large-scale programming, but the interaction of macros and these other constructs must be defined for their use together. Module and component-systems that can interact with macros have been proposed for Scheme and other languages with macros. For example, the Racket language extends the notion of a macro system to a syntactic tower, where macros can be written in languages including macros, using hygiene to ensure that syntactic layers are distinct and allowing modules to export macros to other modules.

Macros for machine-independent software

Macros are normally used to map a short string (macro invocation) to a longer sequence of instructions. Another, less common, use of macros is to do the reverse: to map a sequence of instructions to a macro string. This was the approach taken by the STAGE2 Mobile Programming System, which used a rudimentary macro compiler (called SIMCMP) to map the specific instruction set of a given computer to counterpart machine-independent macros. Applications (notably compilers) written in these machine-independent macros can then be run without change on any computer equipped with the rudimentary macro compiler. The first application run in such a context is a more sophisticated and powerful macro compiler, written in the machine-independent macro language. This macro compiler is applied to itself, in a bootstrap fashion, to produce a compiled and much more efficient version of itself. The advantage of this approach is that complex applications can be ported from one computer to a very different computer with very little effort (for each target machine architecture, just the writing of the rudimentary macro compiler). [17] [18] The advent of modern programming languages, notably C, for which compilers are available on virtually all computers, has rendered such an approach superfluous. This was, however, one of the first instances (if not the first) of compiler bootstrapping.

Assembly language

While macro instructions can be defined by a programmer for any set of native assembler program instructions, typically macros are associated with macro libraries delivered with the operating system allowing access to operating system functions such as

In older operating systems such as those used on IBM mainframes, full operating system functionality was only available to assembler language programs, not to high level language programs (unless assembly language subroutines were used, of course), as the standard macro instructions did not always have counterparts in routines available to high-level languages.

History

In the mid-1950s, when assembly language programming was commonly used to write programs for digital computers, the use of macro instructions was initiated for two main purposes: to reduce the amount of program coding that had to be written by generating several assembly language statements from one macro instruction and to enforce program writing standards, e.g. specifying input/output commands in standard ways. [21] Macro instructions were effectively a middle step between assembly language programming and the high-level programming languages that followed, such as FORTRAN and COBOL. Two of the earliest programming installations to develop "macro languages" for the IBM 705 computer were at Dow Chemical Corp. in Delaware and the Air Material Command, Ballistics Missile Logistics Office in California. A macro instruction written in the format of the target assembly language would be processed by a macro compiler, which was a pre-processor to the assembler, to generate one or more assembly language instructions to be processed next by the assembler program that would translate the assembly language instructions into machine language instructions. [22]

By the late 1950s the macro language was followed by the Macro Assemblers. This was a combination of both where one program served both functions, that of a macro pre-processor and an assembler in the same package. [22] [ failed verification ]

In 1959, Douglas E. Eastwood and Douglas McIlroy of Bell Labs introduced conditional and recursive macros into the popular SAP assembler, [23] creating what is known as Macro SAP. [24] McIlroy's 1960 paper was seminal in the area of extending any (including high-level) programming languages through macro processors. [25] [23]

Macro Assemblers allowed assembly language programmers to implement their own macro-language and allowed limited portability of code between two machines running the same CPU but different operating systems, for example, early versions of MSDOS and CPM-86. The macro library would need to be written for each target machine but not the overall assembly language program. Note that more powerful macro assemblers allowed use of conditional assembly constructs in macro instructions that could generate different code on different machines or different operating systems, reducing the need for multiple libraries.[ citation needed ]

In the 1980s and early 1990s, desktop PCs were only running at a few MHz and assembly language routines were commonly used to speed up programs written in C, Fortran, Pascal and others. These languages, at the time, used different calling conventions. Macros could be used to interface routines written in assembly language to the front end of applications written in almost any language. Again, the basic assembly language code remained the same, only the macro libraries needed to be written for each target language.[ citation needed ]

In modern operating systems such as Unix and its derivatives, operating system access is provided through subroutines, usually provided by dynamic libraries. High-level languages such as C offer comprehensive access to operating system functions, obviating the need for assembler language programs for such functionality.[ citation needed ]

See also

Related Research Articles

Assembly language Low level programming language

In computer programming, assembly language, often abbreviated asm, is any low-level programming language in which there is a very strong correspondence between the instructions in the language and the architecture's machine code instructions. Because assembly depends on the machine code instructions, every assembler has its own assembly language which is designed for exactly one specific computer architecture. Assembly language may also be called symbolic machine code.

A compiler is a computer program that translates computer code written in one programming language into another language. The name compiler is primarily used for programs that translate source code from a high-level programming language to a lower level language to create an executable program.

Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 (R2004). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived from the ANSI Common Lisp standard.

Lisp (programming language) Programming language

Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized prefix notation. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today. Only Fortran is older, by one year. Lisp has changed since its early days, and many dialects have existed over its history. Today, the best-known general-purpose Lisp dialects are Clojure, Common Lisp, and Scheme.

Scheme is a programming language that supports multiple paradigms, including functional and imperative programming. It is one of the three main dialects of Lisp, alongside Common Lisp and Clojure. Unlike Common Lisp, Scheme follows a minimalist design philosophy, specifying a small standard core with powerful tools for language extension.

In computer science, a preprocessor is a program that processes its input data to produce output that is used as input to another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.

In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the called function. Inline expansion is similar to macro expansion, but occurs during compilation, without changing the source code, while macro expansion occurs prior to compilation, and results in different text that is then processed by the compiler.

The C preprocessor or cpp is the macro preprocessor for the C and C++ computer programming languages. The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line control.

Hygienic macros are macros whose expansion is guaranteed not to cause the accidental capture of identifiers. They are a feature of programming languages such as Scheme, Dylan, Rust, and Julia. The general problem of accidental capture was well known within the Lisp community prior to the introduction of hygienic macros. Macro writers would use language features that would generate unique identifiers or use obfuscated identifiers in order to avoid the problem. Hygienic macros are a programmatic solution to the capture problem that is integrated into the macro expander itself. The term "hygiene" was coined in Kohlbecker et al.'s 1986 paper that introduced hygienic macro expansion, inspired by the terminology used in mathematics.

Metaprogramming is a programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyze or transform other programs, and even modify itself while running. In some cases, this allows programmers to minimize the number of lines of code to express a solution, in turn reducing development time. It also allows programs greater flexibility to efficiently handle new situations without recompilation.

High Level Assembly (HLA) is a high-level assembly language developed by Randall Hyde. It allows the use of higher-level language constructs to aid both beginners and advanced assembly developers. It fully supports advanced data types and object-oriented programming. It uses a syntax loosely based on several high-level programming languages (HLLs), such as Pascal, Ada, Modula-2, and C++, to allow creating readable assembly language programs, and to allow HLL programmers to learn HLA as fast as possible.

In computer programming, a directive or pragma is a language construct that specifies how a compiler should process its input. Directives are not part of the grammar of a programming language, and may vary from compiler to compiler. They can be processed by a preprocessor to specify compiler behavior, or function as a form of in-band parameterization.

In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a higher-level language such as C or Ada.

Extensible programming is a term used in computer science to describe a style of computer programming that focuses on mechanisms to extend the programming language, compiler and runtime environment. Extensible programming languages, supporting this style of programming, were an active area of work in the 1960s, but the movement was marginalized in the 1970s. Extensible programming has become a topic of renewed interest in the 21st century.

Embeddable Common Lisp (ECL) is a small implementation of the ANSI Common Lisp programming language that can be used stand-alone or embedded in extant applications written in C. It creates OS-native executables and libraries from Common Lisp code, and runs on most platforms that support a C compiler. The ECL runtime is a dynamically loadable library for use by applications. It is distributed as free and open-source software under a GNU Lesser Public License (LGPL) 2.1+.

Syntax (programming languages) in programming languages

In computer science, the syntax of a computer language is the set of rules that defines the combinations of symbols that are considered to be a correctly structured document or fragment in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

Racket (programming language) programming language

Racket is a general-purpose, multi-paradigm programming language based on the Scheme dialect of Lisp. It is designed to be a platform for programming language design and implementation. In addition to the core Racket language, Racket is also used to refer to the family of Racket programming languages and the set of tools supporting development on and with Racket. Racket is also used for scripting, computer science education, and research.

SAM76 is a macro programming language used from the late 1970s to the present 2007 initially ran on CP/M.

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

References

  1. Greenwald, Irwin D.; Maureen Kane (April 1959). "The Share 709 System: Programming and Modification". Journal of the ACM. New York, NY, USA: ACM. 6 (2): 128–133. doi:10.1145/320964.320967. One of the important uses of programmer macros is to save time and clerical-type errors in writing sequence of instructions which are often repeated in the course of a program.
  2. Strachey, Christopher (October 1965). "A General Purpose Macrogenerator". Computer Journal. 8 (3): 225–241. doi:10.1093/comjnl/8.3.225.
  3. "Runescape: The Massive Online Adventure Game by Jagex Ltd" . Retrieved 2008-04-03.
  4. "scripts : vim online". www.vim.org.
  5. "Macros". Haxe - The Cross-platform Toolkit.
  6. "Sweet.js - Hygienic Macros for JavaScript". www.sweetjs.org.
  7. "Macros in Python: quasiquotes, case classes, LINQ and more!: lihaoyi/macropy". 7 February 2019 via GitHub.
  8. "LeMP Home Page · Enhanced C#". ecsharp.net.
  9. Marshall, Joe. "untitled email" . Retrieved May 3, 2012.
  10. "AIM-057, MACRO Definitions for LISP, Timothy P. Hart". hdl:1721.1/6111.Cite journal requires |journal= (help)
  11. Kohlbecker, Eugene; Friedman, Daniel; Felleisen, Matthias; Duba, Bruce. "Hygienic Macro Expansion". doi:10.1145/319838.319859.
  12. Clinger, Rees. "Macros that Work"
  13. Flatt, Matthew. "Composable and compilable macros: you want it when?" (PDF).
  14. Rafkind, Jon; Flatt, Matthew. "Honu: Syntactic Extension for Algebraic Notation through Enforestation" (PDF).
  15. "Automata via Macros". cs.brown.edu.
  16. , Matthias Felleisen, LL1 mailing list posting
  17. Orgass, Richard J.; William M. Waite (September 1969). "A base for a mobile programming system". Communications of the ACM. New York, NY, USA: ACM. 12 (9): 507–510. doi:10.1145/363219.363226.
  18. Waite, William M. (July 1970). "The mobile programming system: STAGE2". Communications of the ACM. New York, NY, USA: ACM. 13 (7): 415–421. doi:10.1145/362686.362691.
  19. "University of North Florida" (PDF).
  20. "DTF (DOS/VSE)".
  21. "IBM Knowledge Center". IBM Knowledge Center.
  22. 1 2 "Assembler Language Macro Instructions". Cisco.
  23. 1 2 Holbrook, Bernard D.; Brown, W. Stanley. "Computing Science Technical Report No. 99 – A History of Computing Research at Bell Laboratories (1937–1975)". Bell Labs. Archived from the original on September 2, 2014. Retrieved February 2, 2020.
  24. "Macro SAP – Macro compiler modification of SAP". HOPL: Online Historical Encyclopaedia of Programming Languages. Archived from the original on August 13, 2008.
  25. Layzell, P. (1985). "The History of Macro Processors in Programming Language Extensibility". The Computer Journal. 28 (1): 29–33.