Code as data

Last updated

In computer science, the expression code as data refers to the idea that source code written in a programming language can be manipulated as data, such as a sequence of characters or an abstract syntax tree (AST), and it has an execution semantics only in the context of a given compiler or interpreter. [1] The notion is often used in the context of Lisp-like languages that use S-expressions as their main syntax, as writing programs using nested lists of symbols makes the interpretation of the program as an AST quite transparent (a property known as homoiconicity). [2] [3]

These ideas are generally used in the context of what is called metaprogramming, writing programs that treat other programs as their data. [4] [5] For example, code-as-data allows the serialization of first-class functions in a portable manner. [6] Another use case is storing a program in a string, which is then processed by a compiler to produce an executable. [4] More often there is a reflection API that exposes the structure of a program as an object within the language, reducing the possibility of creating a malformed program. [7]

In computational theory, Kleene's second recursion theorem provides a form of code-is-data, by proving that a program can have access to its own source code. [8]

Code-as-data is also a principle of the Von Neumann architecture, since stored programs and data are both represented as bits in the same memory device. [4] This architecture offers the ability to write self-modifying code.[ citation needed ] It also opens the security risk of disguising a malicious program as user data and then using an exploit to direct execution to the malicious program. [9]

Data as Code

In declarative programming, the Data as Code (DaC) principle refers to the idea that an arbitrary data structure can be exposed using a specialized language semantics or API. For example, a list of integers or a string is data, but in languages such as Lisp and Perl, they can be directly entered and evaluated as code. [1] Configuration scripts, domain-specific languages and markup languages are cases where program execution is controlled by data elements that are not clearly sequences of commands. [10] [11]

Related Research Articles

<span class="mw-page-title-main">Common Lisp</span> Programming language standard

Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ANSI INCITS 226-1994 (S2018). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived from the ANSI Common Lisp standard.

<span class="mw-page-title-main">Lisp (programming language)</span> Programming language family

Lisp is a family of programming languages with a long history and a distinctive, fully parenthesized prefix notation. Originally specified in 1960, Lisp is the third-oldest high-level programming language still in common use, after Fortran and COBOL. Lisp has changed since its early days, and many dialects have existed over its history. Today, the best-known general-purpose Lisp dialects are Common Lisp, Scheme, Racket, and Clojure.

<span class="mw-page-title-main">Programming language</span> Language for communicating instructions to a machine

A programming language is a system of notation for writing computer programs.

<span class="mw-page-title-main">Interpreter (computing)</span> Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter’s Virtual Machine.

In computer programming, the scope of a name binding is the part of a program where the name binding is valid; that is, where the name can be used to refer to the entity. In other parts of the program, the name may refer to a different entity, or to nothing at all. Scope helps prevent name collisions by allowing the same name to refer to different objects – as long as the names have separate scopes. The scope of a name binding is also known as the visibility of an entity, particularly in older or more technical literature—this is in relation to the referenced entity, not the referencing name.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

<span class="mw-page-title-main">Abstract syntax tree</span> Tree representation of the abstract syntactic structure of source code

An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text written in a formal language. Each node of the tree denotes a construct occurring in the text. It is sometimes called just a syntax tree.

In computer science, a preprocessor is a program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.

Template metaprogramming (TMP) is a metaprogramming technique in which templates are used by a compiler to generate temporary source code, which is merged by the compiler with the rest of the source code and then compiled. The output of these templates can include compile-time constants, data structures, and complete functions. The use of templates can be thought of as compile-time polymorphism. The technique is used by a number of languages, the best-known being C++, but also Curl, D, Nim, and XL.

Metaprogramming is a programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyse or transform other programs, and even modify itself while running. In some cases, this allows programmers to minimize the number of lines of code to express a solution, in turn reducing development time. It also allows programs a greater flexibility to efficiently handle new situations without recompilation.

<span class="mw-page-title-main">Apache Groovy</span> Programming language

Apache Groovy is a Java-syntax-compatible object-oriented programming language for the Java platform. It is both a static and dynamic language with features similar to those of Python, Ruby, and Smalltalk. It can be used as both a programming language and a scripting language for the Java Platform, is compiled to Java virtual machine (JVM) bytecode, and interoperates seamlessly with other Java code and libraries. Groovy uses a curly-bracket syntax similar to Java's. Groovy supports closures, multiline strings, and expressions embedded in strings. Much of Groovy's power lies in its AST transformations, triggered through annotations.

Extensible programming is a term used in computer science to describe a style of computer programming that focuses on mechanisms to extend the programming language, compiler and runtime environment. Extensible programming languages, supporting this style of programming, were an active area of work in the 1960s, but the movement was marginalized in the 1970s. Extensible programming has become a topic of renewed interest in the 21st century.

In computer programming, an automatic variable is a local variable which is allocated and deallocated automatically when program flow enters and leaves the variable's scope. The scope is the lexical context, particularly the function or block in which a variable is defined. Local data is typically invisible outside the function or lexical context where it is defined. Local data is also invisible and inaccessible to a called function, but is not deallocated, coming back in scope as the execution thread returns to the caller.

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

In computer programming, homoiconicity is a property of some programming languages. A language is homoiconic if a program written in it can be manipulated as data using the language. The program's internal representation can thus be inferred just by reading the program itself. This property is often summarized by saying that the language treats code as data.

The following outline is provided as an overview of and topical guide to computer programming:

<span class="mw-page-title-main">Clojure</span> Dialect of the Lisp programming language on the Java platform

Clojure is a dynamic and functional dialect of the Lisp programming language on the Java platform.

<span class="mw-page-title-main">Hy</span> Dialect of the Lisp programming language designed to interact with Python

Hy is a dialect of the Lisp programming language designed to interact with Python by translating s-expressions into Python's abstract syntax tree (AST). Hy was introduced at Python Conference (PyCon) 2013 by Paul Tagliamonte. Lisp allows operating on code as data (metaprogramming), thus Hy can be used to write domain-specific languages.

References

  1. 1 2 Poletto, Massimiliano A. (September 1999). Language and compiler support for dynamic code generation (PDF) (PhD). MIT. p. 20. until it is dynamically compiled, dynamic code is data. Similarly, lists in Lisp and strings in Perl are data, but they can be evaluated as code
  2. Plusch, Mike (February 2004). "ConciseXML builds upon the important qualities of XML and S-Expressions". XML Journal. Gale Academic OneFile. 5 (2): 20+. Retrieved 14 January 2023. S-Expressions, or symbolic expressions, is the syntax behind Lisp-like languages, including Scheme. Basically, S-Expressions are nested lists of symbols. S-Expressions are used with languages that support the notion that code is data.
  3. Riehl, Jonathan (22 October 2006). "Assimilating MetaBorg:: Embedding language tools in languages". Proceedings of the 5th international conference on Generative programming and component engineering. pp. 21–28. doi:10.1145/1173706.1173710. ISBN   1595932372. S2CID   11111101. The Lisp and Scheme communities are an exception, since they tend to hold closely to the idea that code is data, and implement a large portion of their language in a smaller core language.
  4. 1 2 3 Klöckner, Andreas; Pinto, Nicolas; Lee, Yunsup; Catanzaro, Bryan; Ivanov, Paul; Fasih, Ahmed (March 2012). "PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation". Parallel Computing. 38 (3): 157–174. arXiv: 0911.3456 . doi:10.1016/j.parco.2011.09.001. S2CID   18928397.
  5. Wu, Chaur (2010). "Metaprogramming". Pro DLR in .NET 4. pp. 185–210. doi:10.1007/978-1-4302-3067-0_8. ISBN   978-1-4302-3066-3.
  6. Tack, Guido; Kornstaedt, Leif; Smolka, Gert (March 2006). "Generic Pickling and Minimization". Electronic Notes in Theoretical Computer Science. 148 (2): 79–103. doi: 10.1016/j.entcs.2005.11.041 .
  7. VanderHart, Luke; Sierra, Stuart (2010). "Macros and Metaprogramming". Practical Clojure. pp. 167–178. doi:10.1007/978-1-4302-7230-4_12. ISBN   978-1-4302-7231-1.
  8. Panangaden, Prakash. "Notes on the recursion theorem" (PDF). COMP 330 Theory of Computation. McGill University. Retrieved 15 January 2023.
  9. Bohme, Rainer; Moore, Tyler (26 August 2013). "A Brief Introduction to Information Security" (PDF).
  10. https://arxiv.org/abs/2401.10603
  11. https://github.com/shuttle-hq/synth