Self-documenting code

Last updated

In computer programming, self-documenting (or self-describing) source code and user interfaces follow naming conventions and structured programming conventions that enable use of the system without prior specific knowledge. [1] In web development, self-documenting refers to a website that exposes the entire process of its creation through public documentation, and whose public documentation is part of the development process.[ citation needed ]

Contents

Objectives

Commonly stated objectives for self-documenting systems include:

Conventions

Self-documenting code is ostensibly written using human-readable names, typically consisting of a phrase in a human language which reflects the symbol's meaning, such as article.numberOfWords or TryOpen. The code must also have a clear and clean structure so that a human reader can easily understand the algorithm used.

Practical considerations

There are certain practical considerations that influence whether and how well the objectives for a self-documenting system can be realized.

Examples

Below is a very simple example of self-documenting C code, using naming conventions in place of explicit comments to make the logic of the code more obvious to human readers.

size_tcount_alphabetic_chars(constchar*text){if(text==NULL)return0;size_tcount=0;while(*text!='\0'){if(is_alphabetic(*text))count++;text++;}returncount;}

Criticism

Jef Raskin criticized the belief in "self-documenting" code by saying that code cannot explain the rationale behind why the program is being written or why it is implemented in such a way. [3]

See also

Related Research Articles

Computer programming or coding is the composition of sequences of instructions, called programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of procedures, by writing code in one or more programming languages. Programmers typically use high-level programming languages that are more easily intelligible to humans than machine code, which is directly executed by the central processing unit. Proficient programming usually requires expertise in several different subjects, including knowledge of the application domain, details of programming languages and generic code libraries, specialized algorithms, and formal logic.

<span class="mw-page-title-main">Literate programming</span> A programming approach of software development

Literate programming is a programming paradigm introduced in 1984 by Donald Knuth in which a computer program is given as an explanation of how it works in a natural language, such as English, interspersed (embedded) with snippets of macros and traditional source code, from which compilable source code can be generated. The approach is used in scientific computing and in data science routinely for reproducible research and open access purposes. Literate programming tools are used by millions of programmers today.

A metasyntactic variable is a specific word or set of words identified as a placeholder in computer science and specifically computer programming. These words are commonly found in source code and are intended to be modified or substituted before real-world usage. For example, foo and bar are used in over 330 Internet Engineering Task Force Requests for Comments, the documents which define foundational internet technologies like HTTP (web), TCP/IP, and email protocols.

<span class="mw-page-title-main">Quine (computing)</span> Self-replicating program

A quine is a computer program that takes no input and produces a copy of its own source code as its only output. The standard terms for these programs in the computability theory and computer science literature are "self-replicating programs", "self-reproducing programs", and "self-copying programs".

Software documentation is written text or illustration that accompanies computer software or is embedded in the source code. The documentation either explains how the software operates or how to use it, and may mean different things to people in different roles.

<span class="mw-page-title-main">String (computer science)</span> Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

<span class="mw-page-title-main">Syntax highlighting</span> Tool of editors for programming, scripting, and markup

Syntax highlighting is a feature of text editors that is used for programming, scripting, or markup languages, such as HTML. The feature displays text, especially source code, in different colours and fonts according to the category of terms. This feature facilitates writing in a structured language such as a programming language or a markup language as both structures and syntax errors are visually distinct. This feature is also employed in many programming related contexts, either in the form of colorful books or online websites to make understanding code snippets easier for readers. Highlighting does not affect the meaning of the text itself; it is intended only for human readers.

In computing, a polyglot is a computer program or script written in a valid form of multiple programming languages or file formats. The name was coined by analogy to multilingualism. A polyglot file is composed by combining syntax from two or more different formats.

<span class="mw-page-title-main">Human-readable medium and data</span> Presentation of data for humans to read

In computing, a human-readable medium or human-readable format is any encoding of data or information that can be naturally read by humans, resulting in human-readable data. It is often encoded as ASCII or Unicode text, rather than as binary data.

<span class="mw-page-title-main">Code folding</span> Tool of editors for programming, scripting and markup

Code or text folding, or less commonly holophrasting, is a feature of some graphical user interfaces that allows the user to selectively hide ("fold") or display ("unfold") parts of a document. This allows the user to manage large amounts of text while viewing only those subsections that are currently of interest. It is typically used with documents which have a natural tree structure consisting of nested elements. Other names for these features include expand and collapse, code hiding, and outlining. In Microsoft Word, the feature is called "collapsible outlining".

Ctags is a programming tool that generates an index file of names found in source and header files of various programming languages to aid code comprehension. Depending on the language, functions, variables, class members, macros and so on may be indexed. These tags allow definitions to be quickly and easily located by a text editor, a code search engine, or other utility. Alternatively, there is also an output mode that generates a cross reference file, listing information about various names found in a set of language files in human-readable form.

Archy is a software system that had a user interface that introduced a different approach for interacting with computers with respect to traditional graphical user interfaces. Designed by human-computer interface expert Jef Raskin, it embodies his ideas and established results about human-centered design described in his book The Humane Interface. These ideas include content persistence, modelessness, a nucleus with commands instead of applications, navigation using incremental text search, and a zooming user interface (ZUI). The system was being implemented at the Raskin Center for Humane Interfaces under Raskin's leadership. Since his death in February 2005 the project was continued by his team, which later shifted focus to the Ubiquity extension for the Firefox browser.

<span class="mw-page-title-main">Markdown</span> Plain text markup language

Markdown is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber created Markdown in 2004 as an easy-to-read markup language. Markdown is widely used for blogging and instant messaging, and also used elsewhere in online forums, collaborative software, documentation pages, and readme files.

AsciiDoc is a human-readable document format, semantically equivalent to DocBook XML, but using plain-text mark-up conventions. AsciiDoc documents can be created using any text editor and read “as-is”, or rendered to HTML or any other format supported by a DocBook tool-chain, i.e. PDF, TeX, Unix manpages, e-books, slide presentations, etc. Common file extensions for AsciiDoc files are txt and adoc.

<span class="mw-page-title-main">SREC (file format)</span> File format developed by Motorola

Motorola S-record is a file format, created by Motorola in the mid-1970s, that conveys binary information as hex values in ASCII text form. This file format may also be known as SRECORD, SREC, S19, S28, S37. It is commonly used for programming flash memory in microcontrollers, EPROMs, EEPROMs, and other types of programmable logic devices. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a HEX file. The HEX file is then imported by a programmer to "burn" the machine code into non-volatile memory, or is transferred to the target system for loading and execution.

Coding conventions are a set of guidelines for a specific programming language that recommend programming style, practices, and methods for each aspect of a program written in that language. These conventions usually cover file organization, indentation, comments, declarations, statements, white space, naming conventions, programming practices, programming principles, programming rules of thumb, architectural best practices, etc. These are guidelines for software structural quality. Software programmers are highly recommended to follow these guidelines to help improve the readability of their source code and make software maintenance easier. Coding conventions are only applicable to the human maintainers and peer reviewers of a software project. Conventions may be formalized in a documented set of rules that an entire team or company follows, or may be as informal as the habitual coding practices of an individual. Coding conventions are not enforced by compilers.

<span class="mw-page-title-main">Comment (computer programming)</span> Explanatory note in the source code of a computer program

In computer programming, a comment is a human-readable explanation or annotation in the source code of a computer program. They are added with the purpose of making the source code easier for humans to understand, and are generally ignored by compilers and interpreters. The syntax of comments in various programming languages varies considerably.

Dart is a programming language designed by Lars Bak and Kasper Lund and developed by Google. It can be used to develop web and mobile apps as well as server and desktop applications.

re2c is a free and open-source lexer generator for C, C++, Go, and Rust. It compiles declarative regular expression specifications to deterministic finite automata. Originally written by Peter Bumbulis and described in his paper, re2c was put in public domain and has been since maintained by volunteers. It is the lexer generator adopted by projects such as PHP, SpamAssassin, Ninja build system and others. Together with the Lemon parser generator, re2c is used in BRL-CAD. This combination is also used with STEPcode, an implementation of ISO 10303 standard.

FLOW is an educational programming language designed by Jef Raskin in 1970 and implemented on several minicomputers in the early 1970s. The goal of the language is to make it easy to explore algorithms through a highly interactive environment. The overall language is very similar in syntax and structure to the BASIC programming language, but has a number of changes in order to make typing code easier. Most notable among these was the concept of "typing amplification", in which short strings, often a single character, were expanded by the language into the complete "unamplified" source code. Modern integrated development environments and code-oriented text editors often include a similar feature, now normally referred to as autocomplete. The beginning programmer would first create a flow chart to solve the problem. Since all of the problems involved words the solution was intuitive. The flow chart would be translated into the flow programming language using a top-down, mechanical method.

References

  1. Schach, Stephen R. (2011). Object-Oriented and Classical Software Engineering (8 ed.). McGraw-Hill Professional. pp.  505–507. ISBN   978-0-07337618-9. OCLC   477254661.
  2. 1 2 3 4 5 Paul, Matthias R. (2002-04-09). "Re: [fd-dev] ANNOUNCE: CuteMouse 2.0 alpha 1". freedos-dev. Archived from the original on 2020-03-24. Retrieved 2020-03-24. […] almost any numeric value in source code should be replaced by a corresponding symbol. This would greatly improve the self-explanatory aspect of source code and significantly ease maintenance of the code in the long run, as it would enable one to search for symbols to find relations between different excerpts of the code. […]
  3. Raskin, Jef (2005-03-18). "Comments Are More Important Than Code - The thorough use of internal documentation is one of the most-overlooked ways of improving software quality and speeding implementation". ACM Queue . Development. 3 (2). ACM, Inc. Archived from the original on 2020-03-24. Retrieved 2019-12-22.

Further reading