Homoiconicity

Last updated

In computer programming, homoiconicity (from the Greek words homo- meaning "the same" and icon meaning "representation") is a property of some programming languages. A language is homoiconic if a program written in it can be manipulated as data using the language. [1] The program's internal representation can thus be inferred just by reading the program itself. This property is often summarized by saying that the language treats code as data.

Contents

In a homoiconic language, the primary representation of programs is also a data structure in a primitive type of the language itself. [1] This makes metaprogramming easier than in a language without this property: reflection in the language (examining the program's entities at runtime) depends on a single, homogeneous structure, and it does not have to handle several different structures that would appear in a complex syntax. Homoiconic languages typically include full support of syntactic macros, allowing the programmer to express transformations of programs in a concise way.

A commonly cited example is Lisp, which was created to allow for easy list manipulations and where the structure is given by S-expressions that take the form of nested lists, and can be manipulated by other Lisp code. [2] Other examples are the programming languages Clojure (a contemporary dialect of Lisp), Rebol (also its successor Red), Refal, Prolog, and possibly Julia (see the section “Implementation methods” for more details).

History

The term first appeared in connection with the TRAC programming language, developed by Calvin Mooers: [3]

One of the main design goals was that the input script of TRAC (what is typed in by the user) should be identical to the text which guides the internal action of the TRAC processor. In other words, TRAC procedures should be stored in memory as a string of characters exactly as the user typed them at the keyboard. If the TRAC procedures themselves evolve new procedures, these new procedures should also be stated in the same script. The TRAC processor in its action interprets this script as its program. In other words, the TRAC translator program (the processor) effectively converts the computer into a new computer with a new program language -- the TRAC language. At any time, it should be possible to display program or procedural information in the same form as the TRAC processor will act upon it during its execution. It is desirable that the internal character code representation be identical to, or very similar to, the external code representation. In the present TRAC implementation, the internal character representation is based upon ASCII. Because TRAC procedures and text have the same representation inside and outside the processor, the term homoiconic is applicable, from homo meaning the same, and icon meaning representation.

The last sentence above is annotated with footnote 4, which gives credit for the origin of the term: [lower-alpha 1]

Following suggestion of McCullough W. S., based upon terminology due to Peirce, C. S.

The researchers implicated in this quote might be neurophysiologist and cybernetician Warren Sturgis McCulloch (note the difference in the surname from the note) and philosopher, logician and mathematician Charles Sanders Peirce. [5] Pierce indeed used the term "icon" in his Semiotic Theory. According to Peirce, there are three kinds of sign in communication: the icon, the index and the symbol. The icon is the simplest representation: an icon physically resembles that which it denotes.

Alan Kay used and possibly popularized the term "homoiconic" through his use of the term in his 1969 PhD thesis: [6]

A notable group of exceptions to all the previous systems are Interactive LISP [...] and TRAC. Both are functionally oriented (one list, the other string), both talk to the user with one language, and both are "homoiconic" in that their internal and external representations are essentially the same. They both have the ability to dynamically create new functions which may then be elaborated at the users's pleasure. Their only great drawback is that programs written in them look like King Burniburiach's letter to the Sumerians done in Babylonian cuniform! [...]

Uses and advantages

One advantage of homoiconicity is that extending the language with new concepts typically becomes simpler, as data representing code can be passed between the meta and base layer of the program. The abstract syntax tree of a function may be composed and manipulated as a data structure in the meta layer, and then evaluated. It can be much easier to understand how to manipulate the code since it can be more easily understood as simple data (since the format of the language itself is as a data format).

A typical demonstration of homoiconicity is the meta-circular evaluator.

Implementation methods

All Von Neumann architecture systems, which includes the vast majority of general purpose computers today, can implicitly be described as homoiconic due to the way that raw machine code executes in memory, the data type being bytes in memory. However, this feature can also be abstracted to the programming language level.

Languages such as Lisp and its dialects, [7] such as Scheme, [8] Clojure, and Racket employ S-expressions to achieve homoiconicity, and are considered the "Purest" forms of homoiconicity, as these languages use the same representation for both data and code.

Other languages provide data structures for easily and efficiently manipulating code. Notable examples of this weaker form of homoiconicity include Julia, Nim, and Elixir.

Languages often considered to be homoiconic include:

In Lisp

Lisp uses S-expressions as an external representation for data and code. S-expressions can be read with the primitive Lisp function READ. READ returns Lisp data: lists, symbols, numbers, strings. The primitive Lisp function EVAL uses Lisp code represented as Lisp data, computes side-effects and returns a result. The result will be printed by the primitive function PRINT, which creates an external S-expression from Lisp data.

Lisp data, a list using different data types: (sub)lists, symbols, strings and integer numbers.

((:name"john":age20)(:name"mary":age18)(:name"alice":age22))

Lisp code. The example uses lists, symbols and numbers.

(*(sin1.1)(cos2.03)); in infix: sin(1.1)*cos(2.03)

Create above expression with the primitive Lisp function LIST and set the variable EXPRESSION to the result

(setfexpression(list'*(list'sin1.1)(list'cos2.03)))->(*(SIN1.1)(COS2.03)); Lisp returns and prints the result(thirdexpression); the third element of the expression->(COS2.03)

Change the COS term to SIN

(setf(first(thirdexpression))'SIN); The expression is now (* (SIN 1.1) (SIN 2.03)).

Evaluate the expression

(evalexpression)->0.7988834

Print the expression to a string

(print-to-stringexpression)->"(* (SIN 1.1) (SIN 2.03))"

Read the expression from a string

(read-from-string"(* (SIN 1.1) (SIN 2.03))")->(*(SIN1.1)(SIN2.03)); returns a list of lists, numbers and symbols

In Prolog

1?-Xis2*5.X=10.2?-L=(Xis2*5),write_canonical(L).is(_,*(2,5))L=(Xis2*5).3?-L=(ten(X):-(Xis2*5)),write_canonical(L).:-(ten(A),is(A,*(2,5)))L=(ten(X):-Xis2*5).4?-L=(ten(X):-(Xis2*5)),assert(L).L=(ten(X):-Xis2*5).5?-ten(X).X=10.6?-

On line 4 we create a new clause. The operator :- separates the head and the body of a clause. With assert/1* we add it to the existing clauses (add it to the "database"), so we can call it later. In other languages we would call it "creating a function during runtime". We can also remove clauses from the database with abolish/1, or retract/1.

* The number after the clause's name is the number of arguments it can take. It is also called arity.

We can also query the database to get the body of a clause:

7?-clause(ten(X),Y).Y=(Xis2*5).8?-clause(ten(X),Y),Y=(XisZ).Y=(Xis2*5),Z=2*5.9?-clause(ten(X),Y),call(Y).X=10,Y=(10is2*5).

call is analogous to Lisp's eval function.

In Rebol

The concept of treating code as data and the manipulation and evaluation thereof can be demonstrated very neatly in Rebol. (Rebol, unlike Lisp, does not require parentheses to separate expressions).

The following is an example of code in Rebol (Note that >> represents the interpreter prompt; spaces between some elements have been added for readability):

>> repeati3 [ print [ i"hello" ] ]  1 hello 2 hello 3 hello

(repeat is in fact a built-in function in Rebol and is not a language construct or keyword).

By enclosing the code in square brackets, the interpreter does not evaluate it, but merely treats it as a block containing words:

[ repeati3 [ print [ i"hello" ] ] ] 

This block has the type block! and can furthermore be assigned as the value of a word by using what appears to be a syntax for assignment, but is actually understood by the interpreter as a special type (set-word!) and takes the form of a word followed by a colon:

>> block1: [ repeati3 [ print [ i"hello" ] ] ] ;; Assign the value of the block to the word `block1` == [repeat i 3 [print [i "hello"]]] >> type?block1 ;; Evaluate the type of the word `block1` == block!

The block can still be interpreted by using the do function provided in Rebol (similar to eval in Lisp).

It is possible to interrogate the elements of the block and change their values, thus altering the behavior of the code if it were to be evaluated:

>> block1/3 ;; The third element of the block == 3 >> block1/3:5 ;; Set the value of the 3rd element to 5 == 5 >> probeblock1 ;; Show the changed block == [repeat i 5 [print [i "hello"]]] >> doblock1 ;; Evaluate the block 1 hello 2 hello 3 hello 4 hello 5 hello

See also

Notes

  1. Previous versions of this Wikipedia page 2006-2023 conflated the unrelated note 5 of the above TRAC paper, resulting in the wrong claim that the term "homoiconic" had origins in a paper on macro processing by Douglas McIlroy. That paper mentions no terminology even remotely similar; its influence on the TRAC work is of a different nature. This claim has since been repeated by some sources. [4]

Related Research Articles

<span class="mw-page-title-main">Common Lisp</span> Programming language standard

Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ANSI INCITS 226-1994 (S2018). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived from the ANSI Common Lisp standard.

<span class="mw-page-title-main">Lisp (programming language)</span> Programming language family

Lisp is a family of programming languages with a long history and a distinctive, fully parenthesized prefix notation. Originally specified in 1960, Lisp is the third-oldest high-level programming language still in common use, after Fortran and COBOL. Lisp has changed since its early days, and many dialects have existed over its history. Today, the best-known general-purpose Lisp dialects are Common Lisp, Scheme, Racket, and Clojure.

Rebol is a cross-platform data exchange language and a multi-paradigm dynamic programming language designed by Carl Sassenrath for network communications and distributed computing. It introduces the concept of dialecting: small, optimized, domain-specific languages for code and data, which is also the most notable property of the language according to its designer Carl Sassenrath:

Although it can be used for programming, writing functions, and performing processes, its greatest strength is the ability to easily create domain-specific languages or dialects

<span class="mw-page-title-main">Scheme (programming language)</span> Dialect of Lisp

Scheme is a dialect of the Lisp family of programming languages. Scheme was created during the 1970s at the MIT Computer Science and Artificial Intelligence Laboratory and released by its developers, Guy L. Steele and Gerald Jay Sussman, via a series of memos now known as the Lambda Papers. It was the first dialect of Lisp to choose lexical scope and the first to require implementations to perform tail-call optimization, giving stronger support for functional programming and associated techniques such as recursive algorithms. It was also one of the first programming languages to support first-class continuations. It had a significant influence on the effort that led to the development of Common Lisp.

<span class="mw-page-title-main">S-expression</span> Data serialization format

In computer programming, an S-expression is an expression in a like-named notation for nested list (tree-structured) data. S-expressions were invented for and popularized by the programming language Lisp, which uses them for source code as well as data.

<span class="mw-page-title-main">Interpreter (computing)</span> Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter’s Virtual Machine.

In programming languages, a closure, also lexical closure or function closure, is a technique for implementing lexically scoped name binding in a language with first-class functions. Operationally, a closure is a record storing a function together with an environment. The environment is a mapping associating each free variable of the function with the value or reference to which the name was bound when the closure was created. Unlike a plain function, a closure allows the function to access those captured variables through the closure's copies of their values or references, even when the function is invoked outside their scope.

Io is a pure object-oriented programming language inspired by Smalltalk, Self, Lua, Lisp, Act1, and NewtonScript. Io has a prototype-based object model similar to the ones in Self and NewtonScript, eliminating the distinction between instance and class. Like Smalltalk, everything is an object and it uses dynamic typing. Like Lisp, programs are just data trees. Io uses actors for concurrency.

Metaprogramming is a programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyse or transform other programs, and even modify itself while running. In some cases, this allows programmers to minimize the number of lines of code to express a solution, in turn reducing development time. It also allows programs a greater flexibility to efficiently handle new situations without recompilation.

In some programming languages, eval, short for the English evaluate, is a function which evaluates a string as though it were an expression in the language, and returns a result; in others, it executes multiple lines of code as though they had been included instead of the line including the eval. The input to eval is not necessarily a string; it may be structured representation of code, such as an abstract syntax tree, or of special type such as code. The analog for a statement is exec, which executes a string as if it were a statement; in some languages, such as Python, both are present, while in other languages only one of either eval or exec is.

In computing, compile-time function execution is the ability of a compiler, that would normally compile a function to machine code and execute it at run time, to execute the function at compile time. This is possible if the arguments to the function are known at compile time, and the function does not make any reference to or attempt to modify any global state.

Refal "is a functional programming language oriented toward symbolic computations", including "string processing, language translation, [and] artificial intelligence". It is one of the oldest members of this family, first conceived of in 1966 as a theoretical tool, with the first implementation appearing in 1968. Refal was intended to combine mathematical simplicity with practicality for writing large and sophisticated programs.

In Lisp programming languages, a fexpr is a function whose operands are passed to it without being evaluated. When a fexpr is called, only the body of the fexpr is evaluated; no other evaluations take place except when explicitly initiated by the fexpr. In contrast, when an ordinary Lisp function is called, the operands are evaluated automatically, and only the results of these evaluations are provided to the function; and when a (traditional) Lisp macro is called, the operands are passed in unevaluated, but whatever result the macro function returns is automatically evaluated.

In computing science and informatics, nesting is where information is organized in layers, or where objects contain other similar objects. It almost always refers to self-similar or recursive structures in some sense.

A symbol in computer programming is a primitive data type whose instances have a human-readable form. Symbols can be used as identifiers. In some programming languages, they are called atoms. Uniqueness is enforced by holding them in a symbol table. The most common use of symbols by programmers is to perform language reflection, and the most common indirectly is their use to create object linkages.

Nemerle is a general-purpose, high-level, statically typed programming language designed for platforms using the Common Language Infrastructure (.NET/Mono). It offers functional, object-oriented, aspect-oriented, reflective and imperative features. It has a simple C#-like syntax and a powerful metaprogramming system.

<span class="mw-page-title-main">Red (programming language)</span> Computer programming language released in 2011

Red is a programming language designed to overcome the limitations of the programming language Rebol. Red was introduced in 2011 by Nenad Rakočević, and is both an imperative and functional programming language. Its syntax and general usage overlaps that of the interpreted Rebol language.

In computer science, the expressions code as data and data as code refer to the interchangeable nature of code and data. Specifically, "code is data" refers to the idea that source code written in a programming language can be manipulated as data, such as a sequence of characters or an abstract syntax tree (AST), and it has an execution semantics only in the context of a given compiler or interpreter. The expression "data is code" refers to the idea that an arbitrary data structure such as a list of integers can be interpreted or compiled using a specialized language semantics. The notions are often used in the context of Lisp-like languages that use S-expressions as their main syntax, as writing programs using nested lists of symbols makes the interpretation of the program as an AST quite transparent.

<span class="mw-page-title-main">LFE (programming language)</span>

Lisp Flavored Erlang (LFE) is a functional, concurrent, garbage collected, general-purpose programming language and Lisp dialect built on Core Erlang and the Erlang virtual machine (BEAM). LFE builds on Erlang to provide a Lisp syntax for writing distributed, fault-tolerant, soft real-time, non-stop applications. LFE also extends Erlang to support metaprogramming with Lisp macros and an improved developer experience with a feature-rich read–eval–print loop (REPL). LFE is actively supported on all recent releases of Erlang; the oldest version of Erlang supported is R14.

References

  1. 1 2 Ceravola, Antonello; Joublin, Frank (2021). "From JSON to JSEN through Virtual Languages". Open Journal of Web Technologies. 8 (1): 1–15. In a homoiconic language, the primary representation of programs is also a data structure in a primitive type of the language itself
  2. Wheeler, David A. "Readable Lisp S-expressions".
  3. Mooers, C.N.; Deutsch, L.P. (1965). "TRAC, A Text-Handling Language". Proceeding ACM '65 Proceedings of the 1965 20th national conference. pp. 229–246. doi:10.1145/800197.806048.
  4. Mannaert, Herwig; McGroarty, Chris; Gallant, Scott; De Cock, Koen; Gallogly, Jim; Raval, Anup; Snively, Keith (2022). "Toward Scalable Collaborative Metaprogramming: A Case Study to Integrate Two Metaprogramming Environments" (PDF). International Journal on Advances in Software. 15: 128–140. ISSN   1942-2628.
  5. "Don't say "Homoiconic"". Expressions of Change. 1 March 2018.
  6. Kay, Alan (1969). The Reactive Engine (PhD). University of Utah.
  7. 1 2 3 4 5 6 7 8 9 Homoiconic Languages
  8. 1 2 Homoiconic languages (archived), in true Blue blog at Oracle
  9. "Lispy Elixir". 8thlight.com. Elixir, on the surface, is not homoiconic. However, the syntax on the surface is just a facade for the homoiconic structure underneath.
  10. "Why we created Julia". julialang.org. We want a language that's homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab.
  11. "metaprogramming". docs.julialang.org. Like Lisp, Julia represents its own code as a data structure of the language itself.
  12. Shapiro, Ehud Y.; Sterling, Leon (1994). The art of Prolog: advanced programming techniques. MIT Press. ISBN   0-262-19338-8.
  13. Ramsay, S.; Pytlik-Zillig, B. (2012). "Code-Generation Techniques for XML Collections Interoperability". dh2012 Digital Humanities Conference Proceedings.
  14. "Notes for Programming Language Experts". Wolfram Language. Wolfram. 2017.