Pretty-printing (or prettyprinting) is the application of any of various stylistic formatting conventions to text files, such as source code, markup, and similar kinds of content. These formatting conventions may entail adhering to an indentation style, using different color and typeface to highlight syntactic elements of source code, or adjusting size, to make the content easier for people to read, and understand. Pretty-printers for source code are sometimes called code formatters or beautifiers.
Pretty-printing usually refers to displaying mathematical expressions similar to the way they would be typeset professionally. For example, in computer algebra systems such as Maxima or Mathematica the system may write output like "x ^ 2 + 3 * x" as "". Some graphing calculators, such as the Casio 9860 series, HP-49/50 series and HP Prime, TI-84 Plus, TI-89, and TI-Nspire, the TI-83 Plus with the PrettyPt [1] add-on, or the TI-84 Plus with the same add-on or the "MathPrint"-enabled OSes, can perform pretty-printing. Additionally, a number of newer scientific calculators are equipped with dot matrix screens capable of pretty-printing such as the Casio FX-ES series (Natural Display), Sharp EL-W series (WriteView), HP SmartCalc 300s, TI-30XB, and Numworks.
Many text formatting programs can also typeset mathematics: TeX was developed specifically for high-quality mathematical typesetting.
Pretty-printing in markup language instances is most typically associated with indentation of tags and string content to visually determine hierarchy and nesting. Although the syntactical structures of tag-based languages do not significantly vary, the indentation may vary significantly due to how a markup language is interpreted or due to the data it describes.
In MathML, whitespace characters do not reflect data, meaning, or syntax above what is required by XML syntax. In HTML, whitespace characters between tags are considered text and are parsed as text nodes into the parsed result. [2] While indentation may be generously applied to a MathML document, sufficient additional care must be taken in pretty-printing an HTML document to ensure additional text nodes are not created or destroyed in general proximity to the content or content-reflective tag elements. This difference in complexity is non-trivial from the perspective of an automated pretty-print operation where no special rules or edge cases are necessary, as in the more simple MathML example. The HTML example may require a series of progressive interrelated algorithms to account for various patterns of tag elements and content that conforms to a uniform style and is consistent in application across various instances, as evidenced by the markup.ts [3] application component used to beautify HTML, XML, and related technologies for the Pretty Diff tool.
Programmers often use tools to format programming language source code in a particular manner. Proper code formatting makes it easier to read and understand. Different programmers often prefer different styles of formatting, such as the use of code indentation and whitespace or positioning of braces. A code formatter or code indenter converts source code from one format style to another. This is relatively straightforward because of the unambiguous syntax of programming languages. Code beautification involves parsing the source code into component structures, such as assignment statements, if blocks, loops, etc. (see also control flow), and formatting them in a manner specified by the user in a configuration file.
Code beautifiers exist as standalone applications and built into text editors and integrated development environments. For example, Emacs' various language modes can correctly indent blocks of code attractively. [4]
An early example of pretty-printing was Bill Gosper's "GRINDEF" (i.e. 'grind function') program (c. 1967), which used combinatorial search with pruning to format LISP programs. Early versions operated on the executable (list structure) form of the Lisp program and were oblivious to the special meanings of various functions. Later versions had special read conventions for incorporating non-executable comments and also for preserving read macros in unexpanded form. They also allowed special indentation conventions for special functions such as if
. [5] [6] The term "grind" was used in some Lisp circles as a synonym for pretty-printing. [7]
Many open source projects have established rules for code layout. The most typical are the GNU formatting [8] and the BSD style. [9] The biggest difference between the two is the location of the braces: in the GNU style, opening and closing braces are on lines by themselves, with the same indent. BSD style places an opening brace at the end of the preceding line, and the closing braces can be followed by else. The size of indent and location of whitespace also differs.
The following example shows some typical C structures and how various indentation style rules format them. Without any formatting at all, it looks like this:
intfoo(intk){if(k<1||k>2){printf("out of range\n");printf("this function requires a value of 1 or 2\n");}else{printf("Switching\n");switch(k){case1:printf("1\n");break;case2:printf("2\n");break;}}}
The GNU indent program produces the following output when asked to indent according to the GNU rules:
intfoo(intk){if(k<1||k>2){printf("out of range\n");printf("this function requires a value of 1 or 2\n");}else{printf("Switching\n");switch(k){case1:printf("1\n");break;case2:printf("2\n");break;}}}
It produces this output when formatting according to BSD rules:
intfoo(intk){if(k<1||k>2){printf("out of range\n");printf("this function requires a value of 1 or 2\n");}else{printf("Switching\n");switch(k){case1:printf("1\n");break;case2:printf("2\n");break;}}}
Related concepts
Utilities
A "Hello, World!" program is generally a simple computer program which outputs to the screen a message similar to "Hello, World!" while ignoring any user input. A small piece of code in most general-purpose programming languages, this program is used to illustrate a language's basic syntax. A "Hello, World!" program is often the first written by a student of a new programming language, but such a program can also be used as a sanity check to ensure that the computer software intended to compile or run source code is correctly installed, and that its operator understands how to use it.
A text editor is a type of computer program that edits plain text. Such programs are sometimes known as "notepad" software. Text editors are provided with operating systems and software development packages, and can be used to change files such as configuration files, documentation files and programming language source code.
In computer programming, an S-expression is an expression in a like-named notation for nested list (tree-structured) data. S-expressions were invented for and popularized by the programming language Lisp, which uses them for source code as well as data.
A man page is a form of software documentation usually found on a Unix or Unix-like operating system. Topics covered include computer programs, formal standards and conventions, and even abstract concepts. A user may invoke a man page by issuing the man
command.
Programming style, also known as coding style or code style, is a set of rules or guidelines that governs the layout of source code. Programming style may also refer an quality aspect of code that is interpreted subjectively.
In computing, a polyglot is a computer program or script written in a valid form of multiple programming languages or file formats. The name was coined by analogy to multilingualism. A polyglot file is composed by combining syntax from two or more different formats.
In computer programming, indentation style is a convention, a.k.a. style, governing the indentation of blocks of source code. An indentation style generally involves consistent width of whitespace before each line of a block, so that the lines of code appear to be related, and dictates whether to use space or tab characters for the indentation whitespace.
YAML is a human-readable data serialization language. It is commonly used for configuration files and in applications where data are being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax that intentionally differs from Standard Generalized Markup Language (SGML). It uses Python-style indentation to indicate nesting and does not require quotes around most string values.
In computer programming, a free-form language is a programming language in which the positioning of characters on the page in program text is insignificant. Program text does not need to be placed in specific columns as on old punched card systems, and frequently ends of lines are insignificant. Whitespace characters are used only to delimit tokens, and have no other significance. Free-form languages allow a greater degree of flexibility and have fewer syntactic rules to learn, which could lower the entry barrier for beginners.
printf is a C standard library function that formats text and writes it to standard output.
Code or text folding, or less commonly holophrasting, is a feature of some graphical user interfaces that allows the user to selectively hide ("fold") or display ("unfold") parts of a document. This allows the user to manage large amounts of text while viewing only those subsections that are currently of interest. It is typically used with documents which have a natural tree structure consisting of nested elements. Other names for these features include expand and collapse, code hiding, and outlining. In Microsoft Word, the feature is called "collapsible outlining".
The off-side rule describes syntax of a computer programming language that defines the bounds of a code block via indentation.
In the written form of many languages, indentation describes empty space, a.k.a. white space, used around text to signify an important aspect of the text such as:
indent is a Unix utility that reformats C and C++ code in a user-defined indentation style and coding style. Support for C++ code is minimal.
The GNU coding standards are a set of rules and guidelines for writing programs that work consistently within the GNU system. The GNU Coding Standards were written by Richard Stallman and other GNU Project volunteers. The standards document is part of the GNU Project and is available from the GNU website. Though it focuses on writing free software for GNU in C, much of it can be applied more generally. In particular, the GNU Project encourages its contributors to always try to follow the standards—whether or not their programs are implemented in C.
A whitespace character is a character data element that represents white space when text is rendered for display by a computer.
Kernel normal form, or KNF, is the coding style used in the development of code for the BSD operating systems. Based on the original KNF concept from the Computer Systems Research Group, it dictates a programming style to which contributed code should adhere prior to its inclusion into the codebase. KNF started out as a codification of how Ken Thompson and Dennis Ritchie formatted the original UNIX C source code. It describes such things as how to name variables, use indents and the use of ANSI C or K&R C code styles. Each BSD variant has its own KNF rules, which have evolved over time to differ from each other in small ways.
GNU Emacs is a free software text editor. It was created by GNU Project founder Richard Stallman, based on the Emacs editor developed for Unix operating systems. GNU Emacs has been a central component of the GNU project and a flagship project of the free software movement. Its tag line is "the extensible self-documenting text editor."
Getopt is a C library function used to parse command-line options of the Unix/POSIX style. It is a part of the POSIX specification, and is universal to Unix-like systems. It is also the name of a Unix program for parsing command line arguments in shell scripts.
OpenLisp is a programming language in the Lisp family developed by Christian Jullien from Eligis. It conforms to the international standard for ISLISP published jointly by the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC), ISO/IEC 13816:1997(E), revised to ISO/IEC 13816:2007(E).