This article may be too technical for most readers to understand.(November 2016) |
In computer science, hygienic macros are macros whose expansion is guaranteed not to cause the accidental capture of identifiers. They are a feature of programming languages such as Scheme, [1] Dylan, [2] Rust, Nim, and Julia. The general problem of accidental capture was well known in the Lisp community before the introduction of hygienic macros. Macro writers would use language features that would generate unique identifiers (e.g., gensym) or use obfuscated identifiers to avoid the problem. Hygienic macros are a programmatic solution to the capture problem that is integrated into the macro expander. The term "hygiene" was coined in Kohlbecker et al.'s 1986 paper that introduced hygienic macro expansion, inspired by terminology used in mathematics. [3]
In programming languages that have non-hygienic macro systems, it is possible for existing variable bindings to be hidden from a macro by variable bindings that are created during its expansion. In C, this problem can be illustrated by the following fragment:
#define INCI(i) { int a=0; ++i; }intmain(void){inta=4,b=8;INCI(a);INCI(b);printf("a is now %d, b is now %d\n",a,b);return0;}
Running the above through the C preprocessor produces:
intmain(void){inta=4,b=8;{inta=0;++a;};{inta=0;++b;};printf("a is now %d, b is now %d\n",a,b);return0;}
The variable a
declared in the top scope is shadowed by the a
variable in the macro, which introduces a new scope. As a result, a
is never altered by the execution of the program, as the output of the compiled program shows:
a is now 4, b is now 9
The hygiene problem can extend beyond variable bindings. Consider this Common Lisp macro:
(defmacromy-unless(condition&bodybody)`(if(not,condition)(progn,@body)))
While there are no references to variables in this macro, it assumes the symbols "if", "not", and "progn" are all bound to their usual definitions in the standard library. If, however the above macro is used in the following code:
(flet((not(x)x))(my-unlesst(formatt"This should not be printed!")))
The definition of "not" has been locally altered and so the expansion of my-unless
changes.
Note however that for Common Lisp this behavior is forbidden, as per 11.1.2.1.2 Constraints on the COMMON-LISP Package for Conforming Programs. It is also possible to completely redefine functions anyway. Some implementations of Common Lisp provide Package Locks to prevent the user to change definitions in packages by mistake.
Of course, the problem can occur for program-defined functions in a similar way:
(defunuser-defined-operator(cond)(notcond))(defmacromy-unless(condition&bodybody)`(if(user-defined-operator,condition)(progn,@body))); ... later ...(flet((user-defined-operator(x)x))(my-unlesst(formatt"This should not be printed!")))
The use site redefines user-defined-operator
and hence changes the behavior of the macro.
The hygiene problem can be resolved with conventional macros using several alternative solutions.
The simplest solution, if temporary storage is needed during macro expansion, is to use unusual variables names in the macro in hope that the same names will never be used by the rest of the program.
#define INCI(i) { int INCIa = 0; ++i; }intmain(void){inta=4,b=8;INCI(a);INCI(b);printf("a is now %d, b is now %d\n",a,b);return0;}
Until a variable named INCIa
is created, this solution produces the correct output:
a is now 5, b is now 9
The problem is solved for the current program, but this solution is not robust. The variables used inside the macro and those in the rest of the program have to be kept in sync by the programmer. Specifically, using the macro INCI
on a variable INCIa
is going to fail in the same way that the original macro failed on a variable a
.
In some programming languages, it is possible for a new variable name, or symbol, to be generated and bound to a temporary location. The language processing system ensures that this never clashes with another name or location in the execution environment. The responsibility for choosing to use this feature within the body of a macro definition is left to the programmer. This method was used in MacLisp, where a function named gensym
could be used to generate a new symbol name. Similar functions (usually named gensym
as well) exist in many Lisp-like languages, including the widely implemented Common Lisp standard [4] and Elisp.
Although symbol creation solves the variable shadowing issue, it does not directly solve the issue of function redefinition. [5] However, gensym
, macro facilities, and standard library functions are sufficient to embed hygienic macros in an unhygienic language. [6]
This is similar to obfuscation in that a single name is shared by multiple expansions of the same macro. Unlike an unusual name, however, a read time uninterned symbol is used (denoted by the #:
notation), for which it is impossible to occur outside of the macro, similar to gensym
.
Using packages such as in Common Lisp, the macro simply uses a private symbol from the package in which the macro is defined. The symbol will not accidentally occur in user code. User code would have to reach inside the package using the double colon (::
) notation to give itself permission to use the private symbol, for instance cool-macros::secret-sym
. At that point, the issue of accidental lack of hygiene is moot. Furthermore the ANSI Common Lisp standard categorizes redefining standard functions and operators, globally or locally, as invoking undefined behavior. Such usage can be thus diagnosed by the implementation as erroneous. Thus the Lisp package system provide a viable, complete solution to the macro hygiene problem, which can be regarded as an instance of name clashing.
For example, in the program-defined function redefinition example, the my-unless
macro can reside in its own package, where user-defined-operator
is a private symbol in that package. The symbol user-defined-operator
occurring in the user code will then be a different symbol, unrelated to the one used in the definition of the my-unless
macro.
In some languages the expansion of a macro does not need to correspond to textual code; rather than expanding to an expression containing the symbol f
, a macro may produce an expansion containing the actual object referred to by f
. Similarly if the macro needs to use local variables or objects defined in the macro's package, it can expand to an invocation of a closure object whose enclosing lexical environment is that of the macro definition.
Hygienic macro systems in languages such as Scheme use a macro expansion process that preserves the lexical scoping of all identifiers and prevents accidental capture. This property is called referential transparency. In cases where capture is desired, some systems allow the programmer to explicitly violate the hygiene mechanisms of the macro system.
For example, Scheme's let-syntax
and define-syntax
macro creation systems are hygienic, so the following Scheme implementation of my-unless
will have the desired behavior:
(define-syntaxmy-unless(syntax-rules()((_conditionbody...)(if(notcondition)(beginbody...)))))(let((not(lambda(x)x)))(my-unless#t(display"This should not be printed!")(newline)))
The hygienic macro processor responsible for transforming the patterns of the input form into an output form detects symbol clashes and resolves them by temporarily changing the names of symbols. The basic strategy is to identify bindings in the macro definition and replace those names with gensyms, and to identify free variables in the macro definition and make sure those names are looked up in the scope of the macro definition instead of the scope where the macro was used.
Macro systems that automatically enforce hygiene originated with Scheme. The original KFFD algorithm for a hygienic macro system was presented by Kohlbecker in '86. [3] At the time, no standard macro system was adopted by Scheme implementations. Shortly thereafter in '87, Kohlbecker and Wand proposed a declarative pattern-based language for writing macros, which was the predecessor to the syntax-rules
macro facility adopted by the R5RS standard. [1] [7] Syntactic closures, an alternative hygiene mechanism, was proposed as an alternative to Kohlbecker et al.'s system by Bawden and Rees in '88. [8] Unlike the KFFD algorithm, syntactic closures require the programmer to explicitly specify the resolution of the scope of an identifier. In 1993, Dybvig et al. introduced the syntax-case
macro system, which uses an alternative representation of syntax and maintains hygiene automatically. [9] The syntax-case
system can express the syntax-rules
pattern language as a derived macro. The term macro system can be ambiguous because, in the context of Scheme, it can refer to both a pattern-matching construct (e.g., syntax-rules) and a framework for representing and manipulating syntax (e.g., syntax-case, syntactic closures).
Syntax-rules is a high-level pattern matching facility that attempts to make macros easier to write. However, syntax-rules
is not able to succinctly describe certain classes of macros and is insufficient to express other macro systems. Syntax-rules was described in the R4RS document in an appendix but not mandated. Later, R5RS adopted it as a standard macro facility. Here is an example syntax-rules
macro that swaps the value of two variables:
(define-syntaxswap!(syntax-rules()((_ab)(let((tempa))(set!ab)(set!btemp)))))
Due to the deficiencies of a purely syntax-rules
based macro system, the R6RS Scheme standard adopted the syntax-case macro system. [10] Unlike syntax-rules
, syntax-case
contains both a pattern matching language and a low-level facility for writing macros. The former allows macros to be written declaratively, while the latter allows the implementation of alternative frontends for writing macros. The swap example from before is nearly identical in syntax-case
because the pattern matching language is similar:
(define-syntaxswap!(lambda(stx)(syntax-casestx()((_ab)(syntax(let((tempa))(set!ab)(set!btemp)))))))
However, syntax-case
is more powerful than syntax-rules. For example, syntax-case
macros can specify side-conditions on its pattern matching rules via arbitrary Scheme functions. Alternatively, a macro writer can choose not to use the pattern matching frontend and manipulate the syntax directly. Using the datum->syntax
function, syntax-case macros can also intentionally capture identifiers, thus breaking hygiene.
Other macro systems have also been proposed and implemented for Scheme. Syntactic closures and explicit renaming [11] are two alternative macro systems. Both systems are lower-level than syntax-rules and leave the enforcement of hygiene to the macro writer. This differs from both syntax-rules and syntax-case, which automatically enforce hygiene by default. The swap examples from above are shown here using a syntactic closure and explicit renaming implementation respectively:
;; syntactic closures(define-syntaxswap!(sc-macro-transformer(lambda(formenvironment)(let((a(close-syntax(cadrform)environment))(b(close-syntax(caddrform)environment)))`(let((temp,a))(set!,a,b)(set!,btemp))))));; explicit renaming(define-syntaxswap!(er-macro-transformer(lambda(formrenamecompare)(let((a(cadrform))(b(caddrform))(temp(rename'temp)))`(,(rename'let)((,temp,a))(,(rename'set!),a,b)(,(rename'set!),b,temp))))))
Hygienic macros offer safety and referential transparency at the expense of making intentional variable capture less straight-forward. Doug Hoyte, author of Let Over Lambda, writes: [16]
Almost all approaches taken to reducing the impact of variable capture serve only to reduce what you can do with defmacro. Hygienic macros are, in the best of situations, a beginner's safety guard-rail; in the worst of situations they form an electric fence, trapping their victims in a sanitised, capture-safe prison.
— Doug Hoyte
Many hygienic macro systems do offer escape hatches without compromising on the guarantees that hygiene provides; for instance, Racket allows you to define syntax parameters, which allow you to selectively introduce bound variables. Gregg Hendershott gives an example at Fear of Macros [17] of implementing an anaphoric if operator in this way.
Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ANSI INCITS 226-1994 (S2018). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived from the ANSI Common Lisp standard.
Lisp is a family of programming languages with a long history and a distinctive, fully parenthesized prefix notation. Originally specified in the late 1950s, it is the second-oldest high-level programming language still in common use, after Fortran. Lisp has changed since its early days, and many dialects have existed over its history. Today, the best-known general-purpose Lisp dialects are Common Lisp, Scheme, Racket, and Clojure.
In computer programming, a macro is a rule or pattern that specifies how a certain input should be mapped to a replacement output. Applying a macro to an input is known as macro expansion. The input and output may be a sequence of lexical tokens or characters, or a syntax tree. Character macros are supported in software applications to make it easy to invoke common command sequences. Token and tree macros are supported in some programming languages to enable code reuse or to extend the language, sometimes for domain-specific languages.
In computer programming, operator overloading, sometimes termed operator ad hoc polymorphism, is a specific case of polymorphism, where different operators have different implementations depending on their arguments. Operator overloading is generally defined by a programming language, a programmer, or both.
Scheme is a dialect of the Lisp family of programming languages. Scheme was created during the 1970s at the MIT Computer Science and Artificial Intelligence Laboratory and released by its developers, Guy L. Steele and Gerald Jay Sussman, via a series of memos now known as the Lambda Papers. It was the first dialect of Lisp to choose lexical scope and the first to require implementations to perform tail-call optimization, giving stronger support for functional programming and associated techniques such as recursive algorithms. It was also one of the first programming languages to support first-class continuations. It had a significant influence on the effort that led to the development of Common Lisp.
In a computer language, a reserved word is a word that cannot be used as an identifier, such as the name of a variable, function, or label – it is "reserved from use". This is a syntactic definition, and a reserved word may have no user-defined meaning.
In computer programming, an S-expression is an expression in a like-named notation for nested list (tree-structured) data. S-expressions were invented for and popularized by the programming language Lisp, which uses them for source code as well as data.
In computer programming, the scope of a name binding is the part of a program where the name binding is valid; that is, where the name can be used to refer to the entity. In other parts of the program, the name may refer to a different entity, or to nothing at all. Scope helps prevent name collisions by allowing the same name to refer to different objects – as long as the names have separate scopes. The scope of a name binding is also known as the visibility of an entity, particularly in older or more technical literature—this is in relation to the referenced entity, not the referencing name.
In programming languages, a closure, also lexical closure or function closure, is a technique for implementing lexically scoped name binding in a language with first-class functions. Operationally, a closure is a record storing a function together with an environment. The environment is a mapping associating each free variable of the function with the value or reference to which the name was bound when the closure was created. Unlike a plain function, a closure allows the function to access those captured variables through the closure's copies of their values or references, even when the function is invoked outside their scope.
In computer science, a preprocessor is a program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.
In computer programming, M-expressions were an early proposed syntax for the Lisp programming language, inspired by contemporary languages such as Fortran and ALGOL. The notation was never implemented into the language and, as such, it was never finalized.
In computer programming, a function object is a construct allowing an object to be invoked or called as if it were an ordinary function, usually with the same syntax. In some languages, particularly C++, function objects are often called functors.
MLISP is a variant of Lisp with an Algol-like syntax based on M-Expressions, which were the function syntax in the original description of Lisp by John McCarthy. McCarthy's M-expressions were never implemented in an exact form.
In computer science, extensible programming is a style of computer programming that focuses on mechanisms to extend the programming language, compiler, and runtime system (environment). Extensible programming languages, supporting this style of programming, were an active area of work in the 1960s, but the movement was marginalized in the 1970s. Extensible programming has become a topic of renewed interest in the 21st century.
In computer programming, a naming convention is a set of rules for choosing the character sequence to be used for identifiers which denote variables, types, functions, and other entities in source code and documentation.
In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.
Racket is a general-purpose, multi-paradigm programming language. The Racket language is a modern dialect of Lisp and a descendant of Scheme. It is designed as a platform for programming language design and implementation. In addition to the core Racket language, Racket is also used to refer to the family of programming languages and set of tools supporting development on and with Racket. Racket is also used for scripting, computer science education, and research.
In computer programming, a constant is a value that is not altered by the program during normal execution. When associated with an identifier, a constant is said to be "named," although the terms "constant" and "named constant" are often used interchangeably. This is contrasted with a variable, which is an identifier with a value that can be changed during normal execution. To simplify, constants' values remains, while the values of variables varies, hence both their names.
Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level system programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.
This article includes a list of general references, but it lacks sufficient corresponding inline citations .(April 2012) |