Extensible programming

Last updated

Extensible programming is a term used in computer science to describe a style of computer programming that focuses on mechanisms to extend the programming language, compiler and runtime environment. Extensible programming languages, supporting this style of programming, were an active area of work in the 1960s, but the movement was marginalized in the 1970s. [1] Extensible programming has become a topic of renewed interest in the 21st century. [2]

Contents

Historical movement

The first paper usually [1] [3] associated with the extensible programming language movement is M. Douglas McIlroy's 1960 paper on macros for higher-level programming languages. [4] Another early description of the principle of extensibility occurs in Brooker and Morris's 1960 paper on the Compiler-Compiler. [5] The peak of the movement was marked by two academic symposia, in 1969 and 1971. [6] [7] By 1975, a survey article on the movement by Thomas A. Standish [1] was essentially a post mortem. The Forth programming language was an exception, but it went essentially unnoticed.

Character of the historical movement

As typically envisioned, an extensible programming language consisted of a base language providing elementary computing facilities, and a meta-language capable of modifying the base language. A program then consisted of meta-language modifications and code in the modified base language.

The most prominent language-extension technique used in the movement was macro definition. Grammar modification was also closely associated with the movement, resulting in the eventual development of adaptive grammar formalisms. The Lisp language community remained separate from the extensible language community, apparently because, as one researcher observed,

any programming language in which programs and data are essentially interchangeable can be regarded as an extendible [sic] language. ... this can be seen very easily from the fact that Lisp has been used as an extendible language for years. [8]

At the 1969 conference, Simula was presented as an extensible programming language.

Standish described three classes of language extension, which he called paraphrase , orthophrase, and metaphrase (otherwise paraphrase and metaphrase being translation terms).

Death of the historical movement

Standish attributed the failure of the extensibility movement to the difficulty of programming successive extensions. An ordinary programmer might build a single shell of macros around a base language, but if a second shell of macros was to be built around that, the programmer would have to be intimately familiar with both the base language and the first shell; a third shell would require familiarity with the base and both the first and second shells; and so on. (Note that shielding the programmer from lower-level details is the intent of the abstraction movement that supplanted the extensibility movement.)

Despite the earlier presentation of Simula as extensible, by 1975, Standish's survey does not seem in practice to have included the newer abstraction-based technologies (though he used a very general definition of extensibility that technically could have included them). A 1978 history of programming abstraction from the invention of the computer to the (then) present day made no mention of macros, and gave no hint that the extensible languages movement had ever occurred. [9] Macros were tentatively admitted into the abstraction movement by the late 1980s (perhaps due to the advent of hygienic macros), by being granted the pseudonym syntactic abstractions. [10]

Modern movement

In the modern sense, a system that supports extensible programming will provide all of the features described below[ citation needed ].

Extensible syntax

This simply means that the source language(s) to be compiled must not be closed, fixed, or static. It must be possible to add new keywords, concepts, and structures to the source language(s). Languages which allow the addition of constructs with user defined syntax include Coq, [11] Racket, Camlp4, OpenC++, Seed7, [12] Red, Rebol, and Felix. While it is acceptable for some fundamental and intrinsic language features to be immutable, the system must not rely solely on those language features. It must be possible to add new ones.

Extensible compiler

In extensible programming, a compiler is not a monolithic program that converts source code input into binary executable output. The compiler itself must be extensible to the point that it is really a collection of plugins that assist with the translation of source language input into anything. For example, an extensible compiler will support the generation of object code, code documentation, re-formatted source code, or any other desired output. The architecture of the compiler must permit its users to "get inside" the compilation process and provide alternative processing tasks at every reasonable step in the compilation process.

For just the task of translating source code into something that can be executed on a computer, an extensible compiler should:

Extensible runtime

At runtime, extensible programming systems must permit languages to extend the set of operations that it permits. For example, if the system uses a byte-code interpreter, it must allow new byte-code values to be defined. As with extensible syntax, it is acceptable for there to be some (smallish) set of fundamental or intrinsic operations that are immutable. However, it must be possible to overload or augment those intrinsic operations so that new or additional behavior can be supported.

Content separated from form

Extensible programming systems should regard programs as data to be processed. Those programs should be completely devoid of any kind of formatting information. The visual display and editing of programs to users should be a translation function, supported by the extensible compiler, that translates the program data into forms more suitable for viewing or editing. Naturally, this should be a two-way translation. This is important because it must be possible to easily process extensible programs in a variety of ways. It is unacceptable for the only uses of source language input to be editing, viewing and translation to machine code. The arbitrary processing of programs is facilitated by de-coupling the source input from specifications of how it should be processed (formatted, stored, displayed, edited, etc.).

Source language debugging support

Extensible programming systems must support the debugging of programs using the constructs of the original source language regardless of the extensions or transformation the program has undergone in order to make it executable. Most notably, it cannot be assumed that the only way to display runtime data is in structures or arrays. The debugger, or more correctly 'program inspector', must permit the display of runtime data in forms suitable to the source language. For example, if the language supports a data structure for a business process or work flow, it must be possible for the debugger to display that data structure as a fishbone chart or other form provided by a plugin.


Examples

See also

Related Research Articles

<span class="mw-page-title-main">Assembly language</span> Low-level programming language

In computer programming, assembly language, often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence between the instructions in the language and the architecture's machine code instructions. Assembly language usually has one statement per machine instruction (1:1), but constants, comments, assembler directives, symbolic labels of, e.g., memory locations, registers, and macros are generally also supported.

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

<span class="mw-page-title-main">Macro (computer science)</span> Rule for substituting a set input with a set output

In computer programming, a macro is a rule or pattern that specifies how a certain input should be mapped to a replacement output. Applying a macro to an input is known as macro expansion. The input and output may be a sequence of lexical tokens or characters, or a syntax tree. Character macros are supported in software applications to make it easy to invoke common command sequences. Token and tree macros are supported in some programming languages to enable code reuse or to extend the language, sometimes for domain-specific languages.

<span class="mw-page-title-main">Programming language</span> Language for communicating instructions to a machine

A programming language is a system of notation for writing computer programs.

<span class="mw-page-title-main">Interpreter (computing)</span> Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter's Virtual Machine.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

<span class="mw-page-title-main">Abstract syntax tree</span> Tree representation of the abstract syntactic structure of source code

An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text written in a formal language. Each node of the tree denotes a construct occurring in the text. It is sometimes called just a syntax tree.

In computer science, a preprocessor is a program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.

The C preprocessor is the macro preprocessor for several computer programming languages, such as C, Objective-C, C++, and a variety of Fortran languages. The preprocessor provides inclusion of header files, macro expansions, conditional compilation, and line control.

IMP is an early systems programming language that was developed by Edgar T. Irons in the late 1960s through early 1970s, at the National Security Agency (NSA). Unlike most other systems languages, IMP supports syntax-extensible programming.

Harbour is a computer programming language, primarily used to create database/business programs. It is a modernized, open source and cross-platform version of the older Clipper system, which in turn developed from the dBase database market of the 1980s and 1990s.

Camlp4 is a software system for writing extensible parsers for programming languages. It provides a set of OCaml libraries that are used to define grammars as well as loadable syntax extensions of such grammars. Camlp4 stands for Caml Preprocessor and Pretty-Printer and one of its most important applications was the definition of domain-specific extensions of the syntax of OCaml.

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

Haxe is a high-level cross-platform programming language and compiler that can produce applications and source code for many different computing platforms from one code-base. It is free and open-source software, released under the MIT License. The compiler, written in OCaml, is released under the GNU General Public License (GPL) version 2.

<span class="mw-page-title-main">History of compiler construction</span>

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

The programming language C# version 3.0 was released on 19 November 2007 as part of .NET Framework 3.5. It includes new features inspired by functional programming languages such as Haskell and ML, and is driven largely by the introduction of the Language Integrated Query (LINQ) pattern to the Common Language Runtime. It is not currently standardized by any standards organisation.

<span class="mw-page-title-main">Device driver synthesis and verification</span>

Device drivers are programs which allow software or higher-level computer programs to interact with a hardware device. These software components act as a link between the devices and the operating systems, communicating with each of these systems and executing commands. They provide an abstraction layer for the software above and also mediate the communication between the operating system kernel and the devices below.

The following outline is provided as an overview of and topical guide to C++:

Seed7 is an extensible general-purpose programming language designed by Thomas Mertes. It is syntactically similar to Pascal and Ada. Along with many other features, it provides an extension mechanism. Seed7 supports introducing new syntax elements and their semantics into the language, and allows new language constructs to be defined and written in Seed7. For example, programmers can introduce syntax and semantics of new statements and user defined operator symbols. The implementation of Seed7 differs significantly from that of languages with hard-coded syntax and semantics.

<span class="mw-page-title-main">Nim (programming language)</span> Programming language

Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level systems programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.

References

  1. 1 2 3 Standish, Thomas A., "Extensibility in Programming Language Design", SIGPLAN Notices 10 no. 7 (July 1975), pp. 18–21.
  2. Gregory V. Wilson, "Extensible Programming for the 21st Century", ACM Queue 2 no. 9 (Dec/Jan 2004–2005).
  3. Sammet, Jean E., Programming Languages: History and Fundamentals, Prentice-Hall, 1969, section III.7.2
  4. McIlroy, M.D., "Macro Instruction Extensions of Compiler Languages", Communications of the ACM 3 no. 4 (April 1960), pp. 214–220.
  5. Brooker, R.A. and Morris, D., "A General Translation Program for Phrase Structure Languages", Journal of the ACM 9 no. 1 (January 1962), pp. 1–10. The paper was received in 1960.
  6. Christensen, C. and Shaw, C.J., eds., Proceedings of the Extensible Languages Symposium, SIGPLAN Notices 4 no. 8 (August 1969).
  7. Schuman, S.A., ed., Proceedings of the International Symposium on Extensible Languages, SIGPLAN Notices 6 no. 12 (December 1971).
  8. Harrison, M.C., in "Panel on the Concept of Extensibility", pp. 53–54 of the 1969 symposium.
  9. Guarino, L.R., "The Evolution of Abstraction in Programming Languages [ dead link ]", CMU-CS-78-120, Department of Computer Science, Carnegie-Mellon University, Pennsylvania, 22 May 1978.
  10. Gabriel, Richard P., ed., "Draft Report on Requirements for a Common Prototyping System", SIGPLAN Notices 24 no. 3 (March 1989), pp. 93ff.
  11. "Syntax extensions and notation scopes — Coq 8.17.0 documentation". coq.inria.fr. Retrieved 2023-05-25.
  12. Zingaro, Daniel, "Modern Extensible Languages", SQRL Report 47 McMaster University (October 2007), page 16.

General

  1. Greg Wilson's Article in ACM Queue
  2. Slashdot Discussion
  3. Modern Extensible Languages - A paper from Daniel Zingaro

Tools

  1. MetaLan extensible programming compiler engine implementation
  2. XPS — eXtensible Programming System (in development)
  3. MPS — JetBrains Metaprogramming system

Programming languages with extensible syntax

  1. OpenZz
  2. xtc — eXTensible C
  3. English-script
  4. Nemerle Macros
  5. Boo Syntactic Macros
  6. Stanford University Intermediate Format compiler
  7. Seed7 - The extensible programming language
  8. Katahdin - a programming language with syntax and semantics that are mutable at runtime
  9. π - another programming language with extensible syntax, implemented using an Earley parser