Metaprogramming

Last updated

Metaprogramming is a computer programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyse, or transform other programs, and even modify itself, while running. [1] [2] In some cases, this allows programmers to minimize the number of lines of code to express a solution, in turn reducing development time. [3] It also allows programs more flexibility to efficiently handle new situations with no recompiling.

Contents

Metaprogramming can be used to move computations from runtime to compile time, to generate code using compile time computations, and to enable self-modifying code. The ability of a programming language to be its own metalanguage allows reflective programming, and is termed reflection. [4] Reflection is a valuable language feature to facilitate metaprogramming.

Metaprogramming was popular in the 1970s and 1980s using list processing languages such as Lisp. Lisp machine hardware gained some notice in the 1980s, and enabled applications that could process code. They were often used for artificial intelligence applications.

Approaches

Metaprogramming enables developers to write programs and develop code that falls under the generic programming paradigm. Having the programming language itself as a first-class data type (as in Lisp, Prolog, SNOBOL, or Rebol) is also very useful; this is known as homoiconicity . Generic programming invokes a metaprogramming facility within a language by allowing one to write code without the concern of specifying data types since they can be supplied as parameters when used.

Metaprogramming usually works in one of three ways. [5]

  1. The first approach is to expose the internals of the runtime system (engine) to the programming code through application programming interfaces (APIs) like that for the .NET Common Intermediate Language (CIL) emitter.
  2. The second approach is dynamic execution of expressions that contain programming commands, often composed from strings, but can also be from other methods using arguments or context, like JavaScript. [6] Thus, "programs can write programs." Although both approaches can be used in the same language, most languages tend to lean toward one or the other.
  3. The third approach is to step outside the language entirely. General purpose program transformation systems such as compilers, which accept language descriptions and carry out arbitrary transformations on those languages, are direct implementations of general metaprogramming. This allows metaprogramming to be applied to virtually any target language without regard to whether that target language has any metaprogramming abilities of its own. One can see this at work with Scheme and how it allows tackling some limits faced in C by using constructs that are part of the Scheme language to extend C. [7]

Lisp is probably the quintessential language with metaprogramming facilities, both because of its historical precedence and because of the simplicity and power of its metaprogramming. In Lisp metaprogramming, the unquote operator (typically a comma) introduces code that is evaluated at program definition time rather than at run time. The metaprogramming language is thus identical to the host programming language, and existing Lisp routines can be directly reused for metaprogramming if desired. This approach has been implemented in other languages by incorporating an interpreter in the program, which works directly with the program's data. There are implementations of this kind for some common high-level languages, such as RemObjectsPascal Script for Object Pascal.

Usages

Code generation

A simple example of a metaprogram is this POSIX Shell script, which is an example of generative programming:

#!/bin/sh# metaprogramecho'#!/bin/sh'>program foriin$(seq992)doecho"echo $i">>program done chmod+xprogram 

This script (or program) generates a new 993-line program that prints out the numbers 1–992. This is only an illustration of how to use code to write more code; it is not the most efficient way to print out a list of numbers. Nonetheless, a programmer can write and execute this metaprogram in less than a minute, and will have generated over 1000 lines of code in that amount of time.

A quine is a special kind of metaprogram that produces its own source code as its output. Quines are generally of recreational or theoretical interest only.

Not all metaprogramming involves generative programming. If programs are modifiable at runtime, or if incremental compiling is available (such as in C#, Forth, Frink, Groovy, JavaScript, Lisp, Elixir, Lua, Nim, Perl, PHP, Python, Rebol, Ruby, Rust, R, SAS, Smalltalk, and Tcl), then techniques can be used to perform metaprogramming without generating source code.

One style of generative approach is to employ domain-specific languages (DSLs). A fairly common example of using DSLs involves generative metaprogramming: lex and yacc, two tools used to generate lexical analysers and parsers, let the user describe the language using regular expressions and context-free grammars, and embed the complex algorithms required to efficiently parse the language.

Code instrumentation

One usage of metaprogramming is to instrument programs in order to do dynamic program analysis.

Challenges

Some argue that there is a sharp learning curve to make complete use of metaprogramming features. [8] Since metaprogramming gives more flexibility and configurability at runtime, misuse or incorrect use of metaprogramming can result in unwarranted and unexpected errors that can be extremely difficult to debug to an average developer. It can introduce risks in the system and make it more vulnerable if not used with care. Some of the common problems, which can occur due to wrong use of metaprogramming are inability of the compiler to identify missing configuration parameters, invalid or incorrect data can result in unknown exception or different results. [9] Due to this, some believe [8] that only high-skilled developers should work on developing features which exercise metaprogramming in a language or platform and average developers must learn how to use these features as part of convention.

Uses in programming languages

Macro systems

Macro assemblers

The IBM/360 and derivatives had powerful macro assembler facilities that were often used to generate complete assembly language programs [ citation needed ] or sections of programs (for different operating systems for instance). Macros provided with CICS transaction processing system had assembler macros that generated COBOL statements as a pre-processing step.

Other assemblers, such as MASM, also support macros.

Metaclasses

Metaclasses are provided by the following programming languages:

Template metaprogramming

Staged metaprogramming

Dependent types

Use of dependent types allows proving that generated code is never invalid. [15] However, this approach is leading-edge and rarely found outside of research programming languages.

Implementations

The list of notable metaprogramming systems is maintained at List of program transformation systems.

See also

Related Research Articles

<span class="mw-page-title-main">Macro (computer science)</span> Rule for substituting a set input with a set output

In computer programming, a macro is a rule or pattern that specifies how a certain input should be mapped to a replacement output. Applying a macro to an input is known as macro expansion.

<span class="mw-page-title-main">Interpreter (computing)</span> Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter's virtual machine.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

In computer science, a preprocessor is a program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.

Template metaprogramming (TMP) is a metaprogramming technique in which templates are used by a compiler to generate temporary source code, which is merged by the compiler with the rest of the source code and then compiled. The output of these templates can include compile-time constants, data structures, and complete functions. The use of templates can be thought of as compile-time polymorphism. The technique is used by a number of languages, the best-known being C++, but also Curl, D, Nim, and XL.

Programming languages can be grouped by the number and types of paradigms supported.

A dynamic programming language is a type of programming language that allows various operations to be determined and executed at runtime. This is different from the compilation phase. Key decisions about variables, method calls, or data types are made when the program is running, unlike in static languages, where the structure and types are fixed during compilation. Dynamic languages provide flexibility. This allows developers to write more adaptable and concise code.

In computer science, hygienic macros are macros whose expansion is guaranteed not to cause the accidental capture of identifiers. They are a feature of programming languages such as Scheme, Dylan, Rust, Nim, and Julia. The general problem of accidental capture was well known in the Lisp community before the introduction of hygienic macros. Macro writers would use language features that would generate unique identifiers or use obfuscated identifiers to avoid the problem. Hygienic macros are a programmatic solution to the capture problem that is integrated into the macro expander. The term "hygiene" was coined in Kohlbecker et al.'s 1986 paper that introduced hygienic macro expansion, inspired by terminology used in mathematics.

<span class="mw-page-title-main">Apache Groovy</span> Programming language

Apache Groovy is a Java-syntax-compatible object-oriented programming language for the Java platform. It is both a static and dynamic language with features similar to those of Python, Ruby, and Smalltalk. It can be used as both a programming language and a scripting language for the Java Platform, is compiled to Java virtual machine (JVM) bytecode, and interoperates seamlessly with other Java code and libraries. Groovy uses a curly-bracket syntax similar to Java's. Groovy supports closures, multiline strings, and expressions embedded in strings. Much of Groovy's power lies in its AST transformations, triggered through annotations.

A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging from widely used languages for common domains, such as HTML for web pages, down to languages used by only one or a few pieces of software, such as MUSH soft code. DSLs can be further subdivided by the kind of language, and include domain-specific markup languages, domain-specific modeling languages, and domain-specific programming languages. Special-purpose computer languages have always existed in the computer age, but the term "domain-specific language" has become more popular due to the rise of domain-specific modeling. Simpler DSLs, particularly ones used by a single application, are sometimes informally called mini-languages.

In computer science, extensible programming is a style of computer programming that focuses on mechanisms to extend the programming language, compiler, and runtime system (environment). Extensible programming languages, supporting this style of programming, were an active area of work in the 1960s, but the movement was marginalized in the 1970s. Extensible programming has become a topic of renewed interest in the 21st century.

In computer programming, Intentional Programming is a programming paradigm developed by Charles Simonyi that encodes in software source code the precise intention which programmers have in mind when conceiving their work. By using the appropriate level of abstraction at which the programmer is thinking, creating and maintaining computer programs become easier. By separating the concerns for intentions and how they are being operated upon, the software becomes more modular and allows for more reusable software code.

A foreign function interface (FFI) is a mechanism by which a program written in one programming language can call routines or make use of services written or compiled in another one. An FFI is often used in contexts where calls are made into a binary dynamic-link library.

In computer programming, homoiconicity is an informal property of some programming languages. A language is homoiconic if a program written in it can be manipulated as data using the language. The program's internal representation can thus be inferred just by reading the program itself. This property is often summarized by saying that the language treats code as data. The informality of the property arises from the fact that, strictly, this applies to almost all programming languages. No consensus exists on a precise definition of the property.

<span class="mw-page-title-main">Scripting language</span> Programming language designed for scripting

In computing, a script is a relatively short and simple set of instructions that typically automate an otherwise manual process. The act of writing a script is called scripting. Scripting language or script language describes a programming language that is used for scripting.

<span class="mw-page-title-main">Red (programming language)</span> Computer programming language released in 2011

Red is a programming language designed to overcome the limitations of the programming language Rebol. Red was introduced in 2011 by Nenad Rakočević, and is both an imperative and functional programming language. Its syntax and general usage overlaps that of the interpreted Rebol language.

In computer science, the expression code as data refers to the idea that source code written in a programming language can be manipulated as data, such as a sequence of characters or an abstract syntax tree (AST), and it has an execution semantics only in the context of a given compiler or interpreter. The notion is often used in the context of Lisp-like languages that use S-expressions as their main syntax, as writing programs using nested lists of symbols makes the interpretation of the program as an AST quite transparent.

<span class="mw-page-title-main">Nim (programming language)</span> Programming language

Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level system programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.

References

  1. Sondergaard, Harald (2013). "Course on Program Analysis and Transformation" . Retrieved 18 September 2014.
  2. Czarnecki, Krzysztof; Eisenecker, Ulrich W. (2000). Generative Programming. Addison Wesley. ISBN   0-201-30977-7.
  3. Walker, Max. "The Art of Metaprogrmming in Java". New Circle. Retrieved 28 January 2014.
  4. Krauss, Aaron. "Programming Concepts: Type Introspection and Reflection". The Societa. Retrieved 14 September 2014.
  5. Joshi, Prateek (5 April 2014). "What Is Metaprogramming? – Part 2/2". Perpetual Enigma. Retrieved 14 August 2014.
  6. for example, instance_eval in Ruby takes a string or an anonymous function. "Rdoc for Class: BasicObject (Ruby 1.9.3) - instance_eval" . Retrieved 30 December 2011.
  7. "Art of Metaprogramming". IBM .
  8. 1 2 Bicking, Ian. "The challenge of metaprogramming". IanBicking.org. Retrieved 21 September 2016.
  9. Terry, Matt (21 August 2013). "Beware of Metaprogramming". Medium.com. Medium Corporation. Retrieved 21 August 2014.
  10. Through Common Lisp Object System's "Meta Object Protocol"
  11. "C++ Template Metaprogramming". aszt.inf.elte.hu. Retrieved 2022-07-23.
  12. Lisp (programming language) "Self-evaluating forms and quoting", quasi-quote operator.
  13. "LMS: Program Generation and Embedded Compilers in Scala". scala-lms.github.io. Retrieved 2017-12-06.
  14. Rompf, Tiark; Odersky, Martin (June 2012). "Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs". Communications of the ACM. 55 (6): 121–130. doi:10.1145/2184319.2184345. ISSN   0001-0782. S2CID   52898203.
  15. Chlipala, Adam (June 2010). "Ur: statically-typed metaprogramming with type-level record computation" (PDF). ACM SIGPLAN Notices. PLDI '10. 45 (6): 122–133. doi:10.1145/1809028.1806612 . Retrieved 29 August 2012.