Prettyprint

Last updated

Pretty-printing (or prettyprinting) is the application of any of various stylistic formatting conventions to text files, such as source code, markup, and similar kinds of content. These formatting conventions may entail adhering to an indentation style, using different color and typeface to highlight syntactic elements of source code, or adjusting size, to make the content easier for people to read, and understand. Pretty-printers for source code are sometimes called code formatters or beautifiers.

Contents

Pretty-printing mathematics

A typeset mathematical expression MathematicaTypesetExpression.png
A typeset mathematical expression

Pretty-printing usually refers to displaying mathematical expressions similar to the way they would be typeset professionally. For example, in computer algebra systems such as Maxima or Mathematica the system may write output like "x ^ 2 + 3 * x" as "". Some graphing calculators, such as the Casio 9860 series, HP-49/50 series and HP Prime, TI-84 Plus, TI-89, and TI-Nspire, the TI-83 Plus with the PrettyPt [1] add-on, or the TI-84 Plus with the same add-on or the "MathPrint"-enabled OSes, can perform pretty-printing. Additionally, a number of newer scientific calculators are equipped with dot matrix screens capable of pretty-printing such as the Casio FX-ES series (Natural Display), Sharp EL-W series (WriteView), HP SmartCalc 300s, TI-30XB, and Numworks.

Many text formatting programs can also typeset mathematics: TeX was developed specifically for high-quality mathematical typesetting.

Pretty-printing markup and tag-based code

HTML source code, pretty-printed to better show the hierarchical relationships of its elements (called tags) HTML source code example.svg
HTML source code, pretty-printed to better show the hierarchical relationships of its elements (called tags)

Pretty-printing in markup language instances is most typically associated with indentation of tags and string content to visually determine hierarchy and nesting. Although the syntactical structures of tag-based languages do not significantly vary, the indentation may vary significantly due to how a markup language is interpreted or due to the data it describes.

In MathML, whitespace characters do not reflect data, meaning, or syntax above what is required by XML syntax. In HTML, whitespace characters between tags are considered text and are parsed as text nodes into the parsed result. [2] While indentation may be generously applied to a MathML document, sufficient additional care must be taken in pretty-printing an HTML document to ensure additional text nodes are not created or destroyed in general proximity to the content or content-reflective tag elements. This difference in complexity is non-trivial from the perspective of an automated pretty-print operation where no special rules or edge cases are necessary, as in the more simple MathML example. The HTML example may require a series of progressive interrelated algorithms to account for various patterns of tag elements and content that conforms to a uniform style and is consistent in application across various instances, as evidenced by the markup.ts [3] application component used to beautify HTML, XML, and related technologies for the Pretty Diff tool.

Programming code formatting

Programmers often use tools to format programming language source code in a particular manner. Proper code formatting makes it easier to read and understand. Different programmers often prefer different styles of formatting, such as the use of code indentation and whitespace or positioning of braces. A code formatter or code indenter converts source code from one format style to another. This is relatively straightforward because of the unambiguous syntax of programming languages. Code beautification involves parsing the source code into component structures, such as assignment statements, if blocks, loops, etc. (see also control flow), and formatting them in a manner specified by the user in a configuration file.

Code beautifiers exist as standalone applications and built into text editors and integrated development environments. For example, Emacs' various language modes can correctly indent blocks of code attractively. [4]

HTML

Lisp pretty-printer

An early example of pretty-printing was Bill Gosper's "GRINDEF" (i.e. 'grind function') program (c. 1967), which used combinatorial search with pruning to format LISP programs. Early versions operated on the executable (list structure) form of the Lisp program and were oblivious to the special meanings of various functions. Later versions had special read conventions for incorporating non-executable comments and also for preserving read macros in unexpanded form. They also allowed special indentation conventions for special functions such as if. [5] [6] The term "grind" was used in some Lisp circles as a synonym for pretty-printing. [7]

Project style rules

Many open source projects have established rules for code layout. The most typical are the GNU formatting [8] and the BSD style. [9] The biggest difference between the two is the location of the braces: in the GNU style, opening and closing braces are on lines by themselves, with the same indent. BSD style places an opening brace at the end of the preceding line, and the closing braces can be followed by else. The size of indent and location of whitespace also differs.

Example of formatting and beautifying code

The following example shows some typical C structures and how various indentation style rules format them. Without any formatting at all, it looks like this:

intfoo(intk){if(k<1||k>2){printf("out of range\n");printf("this function requires a value of 1 or 2\n");}else{printf("Switching\n");switch(k){case1:printf("1\n");break;case2:printf("2\n");break;}}}

The GNU indent program produces the following output when asked to indent according to the GNU rules:

intfoo(intk){if(k<1||k>2){printf("out of range\n");printf("this function requires a value of 1 or 2\n");}else{printf("Switching\n");switch(k){case1:printf("1\n");break;case2:printf("2\n");break;}}}

It produces this output when formatting according to BSD rules:

intfoo(intk){if(k<1||k>2){printf("out of range\n");printf("this function requires a value of 1 or 2\n");}else{printf("Switching\n");switch(k){case1:printf("1\n");break;case2:printf("2\n");break;}}}

See also

Related concepts

Utilities

Related Research Articles

A "Hello, World!" program is generally a simple computer program which outputs to the screen a message similar to "Hello, World!" while ignoring any user input. A small piece of code in most general-purpose programming languages, this program is used to illustrate a language's basic syntax. A "Hello, World!" program is often the first written by a student of a new programming language, but such a program can also be used as a sanity check to ensure that the computer software intended to compile or run source code is correctly installed, and that its operator understands how to use it.

<span class="mw-page-title-main">Text editor</span> Computer software used to edit plain text documents

A text editor is a type of computer program that edits plain text. Such programs are sometimes known as "notepad" software. Text editors are provided with operating systems and software development packages, and can be used to change files such as configuration files, documentation files and programming language source code.

<span class="mw-page-title-main">S-expression</span> Data serialization format

In computer programming, an S-expression is an expression in a like-named notation for nested list (tree-structured) data. S-expressions were invented for and popularized by the programming language Lisp, which uses them for source code as well as data.

man page Unix software documentation

A man page is a form of software documentation usually found on a Unix or Unix-like operating system. Topics covered include computer programs, formal standards and conventions, and even abstract concepts. A user may invoke a man page by issuing the man command.

Programming style, also known as code style, is a set of rules or guidelines used when writing the source code for a computer program. It is often claimed that following a particular programming style will help programmers read and understand source code conforming to the style, and help to avoid introducing errors.

In computing, a polyglot is a computer program or script written in a valid form of multiple programming languages or file formats. The name was coined by analogy to multilingualism. A polyglot file is composed by combining syntax from two or more different formats.

In computer programming, an indentation style is a convention governing the indentation of blocks of code to convey program structure. This article largely addresses the free-form languages, such as C and its descendants, but can be applied to most other programming languages, where whitespace is otherwise insignificant. Indentation style is only one aspect of programming style.

YAML(see § History and name) is a human-readable data serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax that intentionally differs from Standard Generalized Markup Language (SGML). It uses both Python-style indentation to indicate nesting, and a more compact format that uses [...] for lists and {...} for maps but forbids tab characters to use as indentation thus only some JSON files are valid YAML 1.2.

<span class="mw-page-title-main">Tab key</span> Key on a keyboard for tabulation

The tab keyTab ↹ on a keyboard is used to advance the cursor to the next tab stop.

In computer programming, a free-form language is a programming language in which the positioning of characters on the page in program text is insignificant. Program text does not need to be placed in specific columns as on old punched card systems, and frequently ends of lines are insignificant. Whitespace characters are used only to delimit tokens, and have no other significance.

<span class="mw-page-title-main">Code folding</span> Tool of editors for programming, scripting and markup

Code or text folding, or less commonly holophrasting, is a feature of some graphical user interfaces that allows the user to selectively hide ("fold") or display ("unfold") parts of a document. This allows the user to manage large amounts of text while viewing only those subsections that are currently of interest. It is typically used with documents which have a natural tree structure consisting of nested elements. Other names for these features include expand and collapse, code hiding, and outlining. In Microsoft Word, the feature is called "collapsible outlining".

A computer programming language is said to adhere to the off-side rule of syntax if blocks in that language are expressed by their indentation. The term was coined by Peter Landin, possibly as a pun on the offside rule in association football. This is contrasted with free-form languages, notably curly-bracket programming languages, where indentation has no computational meaning and indent style is only a matter of coding conventions and formatting. Off-side-rule languages are also described as having significant indentation.

In the written form of many languages, an indentation or indent is an empty space at the beginning of a line to signal the start of a new paragraph. Many computer languages have adopted this technique to designate "paragraphs" or other logical blocks in the program.

indent is a Unix utility that reformats C and C++ code in a user-defined indentation style and coding style. Support for C++ code is minimal.

The GNU coding standards are a set of rules and guidelines for writing programs that work consistently within the GNU system. The GNU Coding Standards were written by Richard Stallman and other GNU Project volunteers. The standards document is part of the GNU Project and is available from the GNU website. Though it focuses on writing free software for GNU in C, much of it can be applied more generally. In particular, the GNU Project encourages its contributors to always try to follow the standards—whether or not their programs are implemented in C.

In typography and especially computer typography, white space or whitespace is any character or series of characters that represent horizontal or vertical space. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page. For example, the common whitespace symbol U+0020 SPACE represents a blank space punctuation character in text, used as a word divider in Western scripts.

Kernel normal form, or KNF, is the coding style used in the development of code for the BSD operating systems. Based on the original KNF concept from the Computer Systems Research Group, it dictates a programming style to which contributed code should adhere prior to its inclusion into the codebase. KNF started out as a codification of how Ken Thompson and Dennis Ritchie formatted the original UNIX C source code. It describes such things as how to name variables, use indents and the use of ANSI C or K&R C code styles. Each BSD variant has its own KNF rules, which have evolved over time to differ from each other in small ways.

<span class="mw-page-title-main">GNU Emacs</span> GNU version of the Emacs text editor

GNU Emacs is a free software text editor. It was created by GNU Project founder Richard Stallman, based on the Emacs editor developed for Unix operating systems. GNU Emacs has been a central component of the GNU project and a flagship project of the free software movement. Its tag line is "the extensible self-documenting text editor."

Getopt is a C library function used to parse command-line options of the Unix/POSIX style. It is a part of the POSIX specification, and is universal to Unix-like systems. It is also the name of a Unix program for parsing command line arguments in shell scripts.

<span class="mw-page-title-main">OpenLisp</span>

OpenLisp is a programming language in the Lisp family developed by Christian Jullien from Eligis. It conforms to the international standard for ISLISP published jointly by the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC), ISO/IEC 13816:1997(E), revised to ISO/IEC 13816:2007(E).

References

  1. "PrettyPrint - ticalc.org". www.ticalc.org. Retrieved 2022-04-13.
  2. Baron, L. David. "Whitespace in the DOM". Mozilla Developer Network. Retrieved 2012-08-27.
  3. markup.ts
  4. Stallman, Richard M. "Indentation for Programs". GNU Emacs Manual. Free Software Foundation. Retrieved 2011-10-20.
  5. Ira Goldstein, "Pretty Printing : Converting List to Linear Structure", Artificial Intelligence Memo 279, Massachusetts Institute of Technology, February 1973. full text
  6. Richard C. Waters, "Using the new common Lisp pretty printer", ACM SIGPLAN Lisp Pointers5:2:27-34, April–June 1992. full text
  7. Jargon File, s.v. grind
  8. GNU style
  9. BSD style