Literate programming is a programming paradigm introduced in 1984 by Donald Knuth in which a computer program is given as an explanation of how it works in a natural language, such as English, interspersed (embedded) with snippets of macros and traditional source code, from which compilable source code can be generated. [1] The approach is used in scientific computing and in data science routinely for reproducible research and open access purposes. [2] Literate programming tools are used by millions of programmers today. [3]
The literate programming paradigm, as conceived by Donald Knuth, represents a move away from writing computer programs in the manner and order imposed by the compiler, and instead gives programmers macros to develop programs in the order demanded by the logic and flow of their thoughts. [4] Literate programs are written as an exposition of logic in more natural language in which macros are used to hide abstractions and traditional source code, more like the text of an essay.
Literate programming (LP) tools are used to obtain two representations from a source file: one understandable by a compiler or interpreter, the "tangled" code, and another for viewing as formatted documentation, which is said to be "woven" from the literate source. [5] While the first generation of literate programming tools were computer language-specific, the later ones are language-agnostic and exist beyond the individual programming languages.
Literate programming was first introduced in 1984 by Donald Knuth, who intended it to create programs that were suitable literature for human beings. He implemented it at Stanford University as a part of his research on algorithms and digital typography. The implementation was called "WEB" since he believed that it was one of the few three-letter words of English that had not yet been applied to computing. [6] However, it resembles the complicated nature of software delicately pieced together from simple materials. [1] The practice of literate programming has seen an important resurgence in the 2010s with the use of computational notebooks, especially in data science.
Literate programming is writing out the program logic in a human language with included (separated by a primitive markup) code snippets and macros. Macros in a literate source file are simply title-like or explanatory phrases in a human language that describe human abstractions created while solving the programming problem, and hiding chunks of code or lower-level macros. These macros are similar to the algorithms in pseudocode typically used in teaching computer science. These arbitrary explanatory phrases become precise new operators, created on the fly by the programmer, forming a meta-language on top of the underlying programming language.
A preprocessor is used to substitute arbitrary hierarchies, or rather "interconnected 'webs' of macros", [7] to produce the compilable source code with one command ("tangle"), and documentation with another ("weave"). The preprocessor also provides an ability to write out the content of the macros and to add to already created macros in any place in the text of the literate program source file, thereby disposing of the need to keep in mind the restrictions imposed by traditional programming languages or to interrupt the flow of thought.
According to Knuth, [8] [9] literate programming provides higher-quality programs, since it forces programmers to explicitly state the thoughts behind the program, making poorly thought-out design decisions more obvious. Knuth also claims that literate programming provides a first-rate documentation system, which is not an add-on, but is grown naturally in the process of exposition of one's thoughts during a program's creation. [10] The resulting documentation allows the author to restart their own thought processes at any later time, and allows other programmers to understand the construction of the program more easily. This differs from traditional documentation, in which a programmer is presented with source code that follows a compiler-imposed order, and must decipher the thought process behind the program from the code and its associated comments. The meta-language capabilities of literate programming are also claimed to facilitate thinking, giving a higher "bird's eye view" of the code and increasing the number of concepts the mind can successfully retain and process. Applicability of the concept to programming on a large scale, that of commercial-grade programs, is proven by an edition of TeX code as a literate program. [8]
Knuth also claims that literate programming can lead to easy porting of software to multiple environments, and even cites the implementation of TeX as an example. [11]
Literate programming is very often misunderstood [12] to refer only to formatted documentation produced from a common file with both source code and comments – which is properly called documentation generation – or to voluminous commentaries included with code. This is the converse of literate programming: well-documented code or documentation extracted from code follows the structure of the code, with documentation embedded in the code; while in literate programming, code is embedded in documentation, with the code following the structure of the documentation.
This misconception has led to claims that comment-extraction tools, such as the Perl Plain Old Documentation or Java Javadoc systems, are "literate programming tools". However, because these tools do not implement the "web of abstract concepts" hiding behind the system of natural-language macros, or provide an ability to change the order of the source code from a machine-imposed sequence to one convenient to the human mind, they cannot properly be called literate programming tools in the sense intended by Knuth. [12] [13]
Implementing literate programming consists of two steps:
Weaving and tangling are done on the same source so that they are consistent with each other.
A classic example of literate programming is the literate implementation of the standard Unix wc
word counting program. Knuth presented a CWEB version of this example in Chapter 12 of his Literate Programming book. The same example was later rewritten for the noweb literate programming tool. [14] This example provides a good illustration of the basic elements of literate programming.
The following snippet of the wc
literate program [14] shows how arbitrary descriptive phrases in a natural language are used in a literate program to create macros, which act as new "operators" in the literate programming language, and hide chunks of code or other macros. The mark-up notation consists of double angle brackets (<<...>>
) that indicate macros. The @
symbol which, in a noweb file, indicates the beginning of a documentation chunk. The <<*>>
symbol stands for the "root", topmost node the literate programming tool will start expanding the web of macros from. Actually, writing out the expanded source code can be done from any section or subsection (i.e. a piece of code designated as <<name of the chunk>>=
, with the equal sign), so one literate program file can contain several files with machine source code.
Thepurposeofwcistocountlines,words,and/orcharactersinalistoffiles.Thenumberoflinesinafileis......../moreexplanations/Here,then,isanoverviewofthefilewc.cthatisdefinedbythenowebprogramwc.nw:<<*>>=<<Headerfilestoinclude>><<Definitions>><<Globalvariables>><<Functions>><<Themainprogram>>@WemustincludethestandardI/Odefinitions,sincewewanttosendformattedoutputtostdoutandstderr.<<Headerfilestoinclude>>=#include<stdio.h>@
The unraveling of the chunks can be done in any place in the literate program text file, not necessarily in the order they are sequenced in the enclosing chunk, but as is demanded by the logic reflected in the explanatory text that envelops the whole program.
Macros are not the same as "section names" in standard documentation. Literate programming macros hide the real code behind themselves, and be used inside any low-level machine language operators, often inside logical operators such as if
, while
or case
. This can be seen in the following wc
literate program. [14]
Thepresentchunk,whichdoesthecounting,wasactuallyoneofthesimplesttowrite.Welookateachcharacterandchangestateifitbeginsorendsaword.<<Scanfile>>=while(1){<<Fillbufferifitisempty;breakatendoffile>>c=*ptr++;if(c>' '&&c<0177){/* visible ASCII codes */if(!in_word){word_count++;in_word=1;}continue;}if(c=='\n')line_count++;elseif(c!=' '&&c!='\t')continue;in_word=0;/* c is newline, space, or tab */}@
The macros stand for any chunk of code or other macros, and are more general than top-down or bottom-up "chunking", or than subsectioning. Donald Knuth said that when he realized this, he began to think of a program as a web of various parts. [1]
In a noweb literate program besides the free order of their exposition, the chunks behind macros, once introduced with <<...>>=
, can be grown later in any place in the file by simply writing <<name of the chunk>>=
and adding more content to it, as the following snippet illustrates (+
is added by the document formatter for readability, and is not in the code). [14]
The grand totals must be initialized to zero at the beginning of the program. If we made these variables local to main, we would have to do this initialization explicitly; however, C globals are automatically zeroed. (Or rather,``statically zeroed.'' (Get it?) <<Global variables>>+= <syntaxhighlight lang="c" class="" id="" style="background:none; border:none; color:inherit; padding: 0px 0px;" inline="1">long tot_word_count, tot_line_count,</syntaxhighlight> tot_char_count; <syntaxhighlight lang="c" class="" id="" style="background:none; border:none; color:inherit; padding: 0px 0px;" inline="1">/* total number of words, lines, chars */</syntaxhighlight> @
The documentation for a literate program is produced as part of writing the program. Instead of comments provided as side notes to source code a literate program contains the explanation of concepts on each level, with lower level concepts deferred to their appropriate place, which allows for better communication of thought. The snippets of the literate wc
above show how an explanation of the program and its source code are interwoven. Such exposition of ideas creates the flow of thought that is like a literary work. Knuth wrote a "novel" which explains the code of the interactive fiction game Colossal Cave Adventure. [15]
The first published literate programming environment was WEB, introduced by Knuth in 1981 for his TeX typesetting system; it uses Pascal as its underlying programming language and TeX for typesetting of the documentation. The complete commented TeX source code was published in Knuth's TeX: The program, volume B of his 5-volume Computers and Typesetting . Knuth had privately used a literate programming system called DOC as early as 1979. He was inspired by the ideas of Pierre-Arnoul de Marneffe. [16] The free CWEB, written by Knuth and Silvio Levy, is WEB adapted for C and C++, runs on most operating systems, and can produce TeX and PDF documentation.
There are various other implementations of the literate programming concept as given below. Many of the newer among these do not have macros and hence do not comply with the order of human logic principle, which makes them perhaps "semi-literate" tools. These, however, allow cellular execution of code which makes them more along the lines of exploratory programming tools.
Name | Supported languages | Written in | Markup language | Macros & custom order | Cellular execution | Comments |
---|---|---|---|---|---|---|
WEB | Pascal | Pascal | TeX | Yes | No | The first published literate programming environment. |
CWEB | C++ and C | C | TeX | Yes | No | Is WEB adapted for C and C++. |
NoWEB | Any | C, AWK, and Icon | LaTeX, TeX, HTML and troff | Yes | No | It is well known for its simplicity and it allows for text formatting in HTML rather than going through the TeX system. |
Literate | Any | D | Markdown | Yes | No | Supports TeX equations. Compatible with Vim (literate.vim) |
FunnelWeb | Any | C | HTML and TeX | Yes? | It has more complicated markup, but has many more flexible options | |
NuWEB | Any | C++ | LaTeX | It can translate a single LP source into any number of code files. It does it in a single invocation; it does not have separate weave and tangle commands. It does not have the extensibility of noweb. | ||
pyWeb | Any | Python | ReStructuredText | Yes | Respects indentation which makes usable for the languages like Python, though one can use it for any programming language. | |
Molly | Any | Perl | HTML | Aims to modernize and scale it with "folding HTML" and "virtual views" on code. It uses "noweb" markup for the literate source files. | ||
Codnar | Ruby | It is an inverse literate programming tool available as a Ruby Gem. Instead of the machine-readable source code being extracted out of the literate documentation sources, the literate documentation is extracted out of the normal machine-readable source code files. | ||||
Emacs org-mode | Any | Emacs Lisp | Plain text | Requires Babel, [17] which allows embedding blocks of source code from multiple programming languages [18] within a single text document. Blocks of code can share data with each other, display images inline, or be parsed into pure source code using the noweb reference syntax. [19] | ||
CoffeeScript | CoffeeScript | CoffeeScript, JavaScript | Markdown | CoffeeScript supports a "literate" mode, which enables programs to be compiled from a source document written in Markdown with indented blocks of code. [20] | ||
Maple worksheets | Maple (software) | XML | Maple worksheets are a platform-agnostic literate programming environment that combines text and graphics with live code for symbolic computation. "Maple Worksheets". MapleSoft.com. Retrieved May 30, 2020. | |||
Wolfram Notebooks | Wolfram Language | Wolfram Language | Wolfram notebooks are a platform-agnostic literate programming method that combines text and graphics with live code. [21] [22] | |||
Playgrounds | Swift (programming language) | Provides an interactive programming environment that evaluates each statement and displays live results as the code is edited. Playgrounds also allow the user to add Markup language along with the code that provide headers, inline formatting and images. [23] | ||||
Jupyter Notebook, formerly IPython Notebook | Python and any with a Jupyter Kernel | JSON format Specification for ipynb | No | Yes | Works in the format of notebooks, which combine headings, text (including LaTeX), plots, etc. with the written code. | |
Jupytext plugin for Jupyter | Many Languages | Python | Markdown in comments | No | Yes | |
nbdev | Python and Jupyter Notebook | nbdev is a library that allows one to develop a python library in Jupyter Notebooks, putting all code, tests and documentation in one place. | ||||
Julia (programming language) | Pluto.jl is a reactive notebook environment allowing custom order. But web-like macros aren't supported. | Yes | Supports the iJulia mode of development which was inspired by iPython. | |||
Agda (programming language) | Supports a limited form of literate programming out of the box. [24] | |||||
Eve programming language | Programs are primarily prose. [25] Eve combines variants of Datalog and Markdown with a live graphical development environment | |||||
R Markdown Notebooks (or R Notebooks) | R, Python, Julia and SQL | PDF, Microsoft Word, LibreOffice and presentation or slide show formats plus interactive formats like HTML widgets | No | Yes | [26] | |
Quarto | R, Python, Julia and Observable | PDF, Microsoft Word, LibreOffice and presentation or slide show formats plus interactive formats like HTML widgets | No | Yes | [26] | |
Sweave | R | [27] [28] | ||||
Knitr | R | LaTeX, PDF, LyX, HTML, Markdown, AsciiDoc, and reStructuredText | [29] [30] | |||
Codebraid | Pandoc, Rust, Julia, Python, R, Bash | Python | Markdown | No | Yes | |
Pweave | Python | No | ||||
MATLAB Live Editor | MATLAB | Markdown | No | Yes | ||
Inweb | C, C++, Inform 6, Inform 7 | C, CWEB | TeX, HTML | Yes? | Used to write the Inform Programming Language since 2004. [31] | |
Mercury | Python | Python, TypeScript | JSON format specification for ipynb | Mercury turns Jupyter Notebook into interactive computational documents. They can be published as web application, dashboards, reports, REST API, or slides. The executed document can be exported as standalone HTML or PDF file. Documents can be scheduled for automatic execution. The document presence and widgets are controlled with YAML header in the first cell of the notebook. | ||
Observable | JavaScript | JavaScript, TypeScript | TeX(KaTeX), HTML | Stored on the cloud with web interface. Contents are publishable as websites. Version controlled; the platform defines its own version control operations. Code cells can be organized out-of-order; observable notebooks will construct the execution graph (a DAG) automatically. A rich standard library implemented with modern features of JavaScript. Cells from different observable notebooks can reference each other. Npm libraries can be imported on the fly. | ||
Ganesha | JavaScript, TypeScript | JavaScript | Markdown | Enables Node.js to load literate modules, represented by Markdown files containing JavaScript or TypeScript code interspersed with richly formatted prose. Supports bundling literate modules for browsers when using the Rollup or Vite frontend module bundlers. | ||
JWEB | C, C++, JavaScript, TypeScript | JavaScript | Markdown | Yes | No |
Other useful tools include:
.hs
and .lhs
; the latter stands for literate Haskell. The literate scripts can be full LaTeX source text, at the same time it can be compiled, with no changes, because the interpreter only compiles the text in a code environment, for example: % here text describing the function:\begin{code}fact0=1fact(n+1)=(n+1)*factn\end{code} here more text
The code can be also marked in the Richard Bird style, starting each line with a greater than symbol and a space, preceding and ending the piece of code with blank lines.
The LaTeX listings
package provides a lstlisting
environment which can be used to embellish the source code. It can be used to define a code
environment to use within Haskell to print the symbols in the following manner:
\newenvironment{code}{\lstlistings[language=Haskell]}{\endlstlistings}\begin{code}comp::(beta->gamma)->(alpha->beta)->(alpha->gamma)(g`comp`f)x=g(fx)\end{code}
which can be configured to yield:
In computer programming, assembly language, often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence between the instructions in the language and the architecture's machine code instructions. Assembly language usually has one statement per machine instruction (1:1), but constants, comments, assembler directives, symbolic labels of, e.g., memory locations, registers, and macros are generally also supported.
Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ANSI INCITS 226-1994 (S2018). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived from the ANSI Common Lisp standard.
Donald Ervin Knuth is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer science. Knuth has been called the "father of the analysis of algorithms".
An integrated development environment (IDE) is a software application that provides comprehensive facilities for software development. An IDE normally consists of at least a source-code editor, build automation tools, and a debugger. Some IDEs, such as IntelliJ IDEA, Eclipse and Lazarus contain the necessary compiler, interpreter or both; others, such as SharpDevelop and NetBeans, do not.
LaTeX is a software system for typesetting documents. LaTeX markup describes the content and layout of the document, as opposed to the formatted text found in WYSIWYG word processors like Google Docs, LibreOffice Writer and Microsoft Word. The writer uses markup tagging conventions to define the general structure of a document, to stylise text throughout a document, and to add citations and cross-references. A TeX distribution such as TeX Live or MiKTeX is used to produce an output file suitable for printing or digital distribution.
In computer programming, a macro is a rule or pattern that specifies how a certain input should be mapped to a replacement output. Applying a macro to an input is known as macro expansion. The input and output may be a sequence of lexical tokens or characters, or a syntax tree. Character macros are supported in software applications to make it easy to invoke common command sequences. Token and tree macros are supported in some programming languages to enable code reuse or to extend the language, sometimes for domain-specific languages.
Software documentation is written text or illustration that accompanies computer software or is embedded in the source code. The documentation either explains how the software operates or how to use it, and may mean different things to people in different roles.
TeX, stylized within the system as TeX, is a typesetting program which was designed and written by computer scientist and Stanford University professor Donald Knuth and first released in 1978. The term now refers to the system of extensions – which includes software programs called TeX engines, sets of TeX macros, and packages which provide extra typesetting functionality – built around the original TeX language. TeX is a popular means of typesetting complex mathematical formulae; it has been noted as one of the most sophisticated digital typographical systems.
A text editor is a type of computer program that edits plain text. An example of such program is "notepad" software. Text editors are provided with operating systems and software development packages, and can be used to change files such as configuration files, documentation files and programming language source code.
Metafont is a description language used to define raster fonts. It is also the name of the interpreter that executes Metafont code, generating the bitmap fonts that can be embedded into e.g. PostScript. Metafont was devised by Donald Knuth as a companion to his TeX typesetting system.
Web is a computer programming system created by Donald E. Knuth as the first implementation of what he called "literate programming": the idea that one could create software as works of literature, by embedding source code inside descriptive text, rather than the reverse, in an order that is convenient for exposition to human readers, rather than in the order demanded by the compiler.
In computer science, program optimization, code optimization, or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources, or draw less power.
Computers and Typesetting is a 5-volume set of books by Donald Knuth published in 1986 describing the TeX and Metafont systems for digital typography. Knuth's computers and typesetting project was the result of his frustration with the lack of decent software for the typesetting of mathematical and technical documents. The results of this project include TeX for typesetting, Metafont for font construction and the Computer Modern typefaces that are the default fonts used by TeX. In the series of five books Knuth not only describes the TeX and Metafont languages, he also describes and documents the source code of the TeX and Metafont interpreters, and the source code for the Computer Modern fonts used by TeX. The book set stands as a tour de force demonstration of literate programming.
The following outline is provided as an overview of and topical guide to software engineering:
A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs, that can be combined to accomplish a task, much as one might use multiple hands to fix a physical object. The most basic tools are a source code editor and a compiler or interpreter, which are used ubiquitously and continuously. Other tools are used more or less depending on the language, development methodology, and individual engineer, often used for a discrete task, like a debugger or profiler. Tools may be discrete programs, executed separately – often from the command line – or may be parts of a single large program, called an integrated development environment (IDE). In many cases, particularly for simpler use, simple ad hoc techniques are used instead of a tool, such as print debugging instead of using a debugger, manual timing instead of a profiler, or tracking bugs in a text file or spreadsheet instead of a bug tracking system.
High-Level Assembly (HLA) is a language developed by Randall Hyde that allows the use of higher-level language constructs to aid both beginners and advanced assembly developers. It supports advanced data types and object-oriented programming. It uses a syntax loosely based on several high-level programming languages (HLLs), such as Pascal, Ada, Modula-2, and C++, to allow the creation of readable assembly language programs, and to allow HLL programmers to learn HLA as fast as possible.
Noweb, stylised in lowercase as noweb, is a literate programming tool, created in 1989–1999 by Norman Ramsey, and designed to be simple, easily extensible and language independent.
Leo is an open-source text editor/outliner that features clones as a central tool of organization, navigation, customization and scripting.
Snippet is a programming term for a small region of re-usable source code, machine code, or text. Ordinarily, these are formally defined operative units to incorporate into larger programming modules. Snippet management is a feature of some text editors, program source code editors, IDEs, and related software. It allows the user to avoid repetitive typing in the course of routine edit operations.
In computer programming, a comment is a programmer-readable explanation or annotation in the source code of a computer program. They are added with the purpose of making the source code easier for humans to understand, and are generally ignored by compilers and interpreters. The syntax of comments in various programming languages varies considerably.
I had the feeling that top-down and bottom-up were opposing methodologies: one more suitable for program exposition and the other more suitable for program creation. But after gaining experience with WEB, I have come to realize that there is no need to choose once and for all between top-down and bottom-up, because a program is best thought of as a web instead of a tree. A hierarchical structure is present, but the most important thing about a program is its structural relationships. A complex piece of software consists of simple parts and simple relations between those parts; the programmer's task is to state those parts and those relationships, in whatever order is best for human comprehension not in some rigidly determined order like top-down or bottom-up.
WEB's macros are allowed to have at most one parameter. Again, I did this in the interests of simplicity, because I noticed that most applications of multiple parameters could in fact be reduced to the one-parameter case. For example, suppose that you want to define something like [example elided] .... In other words, the name of one macro can usefully be a parameter to another macro.
Yet to me, literate programming is certainly the most important thing that came out of the TeX project. Not only has it enabled me to write and maintain programs faster and more reliably than ever before, and been one of my greatest sources of joy since the 1980s-it has actually been indispensable at times. Some of my major programs, such as the MMIX meta-simulator, could not have been written with any other methodology that I've ever heard of. The complexity was simply too daunting for my limited brain to handle; without literate programming, the whole enterprise would have flopped miserably. ... Literate programming is what you need to rise above the ordinary level of achievement.
Another surprising thing that I learned while using WEB was that traditional programming languages had been causing me to write inferior programs, although I hadn't realized what I was doing. My original idea was that WEB would be merely a tool for documentation, but I actually found that my WEB programs were better than the programs I had been writing in other languages.
Thus the WEB language allows a person to express programs in a "stream of consciousness" order. TANGLE is able to scramble everything up into the arrangement that a PASCAL compiler demands. This feature of WEB is perhaps its greatest asset; it makes a WEB-written program much more readable than the same program written purely in PASCAL, even if the latter program is well commented. And the fact that there's no need to be hung up on the question of top-down versus bottom-up, since a programmer can now view a large program as a web, to be explored in a psychologically correct order is perhaps the greatest lesson I have learned from my recent experiences.
I chose the name WEB partly because it was one of the few three-letter words of English that hadn't already been applied to computers. But as time went on, I've become extremely pleased with the name, because I think that a complex piece of software is, indeed, best regarded as a web that has been delicately pieced together from simple materials. We understand a complicated system by understanding its simple parts, and by understanding the simple relations between those parts and their immediate neighbors. If we express a program as a web of ideas, we can emphasize its structural properties in a natural and satisfying way.