Syntax highlighting

Last updated
HTML syntax highlighting HTML source code example.svg
HTML syntax highlighting

Syntax highlighting is a feature of text editors that is used for programming, scripting, or markup languages, such as HTML. The feature displays text, especially source code, in different colours and fonts according to the category of terms. [1] This feature facilitates writing in a structured language such as a programming language or a markup language as both structures and syntax errors are visually distinct. This feature is also employed in many programming related contexts (such as programming manuals), either in the form of colorful books or online websites to make understanding code snippets easier for readers. Highlighting does not affect the meaning of the text itself; it is intended only for human readers.

Contents

Syntax highlighting is a form of secondary notation, since the highlights are not part of the text meaning, but serve to reinforce it. Some editors also integrate syntax highlighting with other features, such as spell checking or code folding, as aids to editing which are external to the language.

Practical benefits

Highlighting the effect of missing delimiter (after watch='false) in JavaScript Syntax-highlighting-javascript.gif
Highlighting the effect of missing delimiter (after watch='false) in JavaScript

Syntax highlighting is one strategy to improve the readability and context of the text; especially for code that spans several pages. The reader can easily ignore large sections of comments or code, depending on what they are looking for. Syntax highlighting also helps programmers find errors in their program. For example, most editors highlight string literals in a different color. Consequently, spotting a missing delimiter is much easier because of the contrasting color of the text. Brace matching is another important feature with many popular editors. This makes it simple to see if a brace has been left out or locate the match of the brace the cursor is on by highlighting the pair in a different color.

A study published in the conference PPIG evaluated the effects of syntax highlighting on the comprehension of short programs, finding that the presence of syntax highlighting significantly reduces the time taken for a programmer to internalise the semantics of a program. [2] Additionally, data gathered from an eye-tracker during the study suggested that syntax highlighting enables programmers to pay less attention to standard syntactic components such as keywords.

Support in text editors

gedit supports syntax highlighting Gedit 3.32 screenshot.png
gedit supports syntax highlighting

Some text editors can also export the coloured markup in a format that is suitable for printing or for importing into word-processing and other kinds of text-formatting software; for instance as a HTML, colorized LaTeX, PostScript or RTF version of its syntax highlighting. There are several syntax highlighting libraries or "engines" that can be used in other applications, but are not complete programs in themselves, for example the Generic Syntax Highlighter (GeSHi) extension for PHP.

For editors that support more than one language, the user can usually specify the language of the text, such as C, LaTeX, HTML, or the text editor can automatically recognize it based on the file extension or by scanning contents of the file. This automatic language detection presents potential problems. For example, a user may want to edit a document containing:

In these cases, it is not clear what language to use, and a document may not be highlighted or be highlighted incorrectly.

Syntax elements

Most editors with syntax highlighting allow different colors and text styles to be given to dozens of different lexical sub-elements of syntax. These include keywords, comments, control-flow statements, variables, and other elements. Programmers often heavily customize their settings in an attempt to show as much useful information as possible without making the code difficult to read.

Called syntax decoration, some editors also display certain syntactical elements in more visually pleasing ways, for example by replacing a pointer operator like -> in source code by an actual arrow symbol (→), or changing text decoration clues like /italics/, *boldface*, or _underline_ in source code comments by an actual italics, boldface, or underlined presentation.

Examples

Below is a comparison of a snippet of C code:

Standard renderingSyntax highlighting
/* Hello World */ #include <stdlib.h> #include <stdio.h>  int main() {     printf("Hello World\n");     return 0; } 
/* Hello World */#include<stdlib.h>#include<stdio.h>intmain(){printf("Hello World\n");return0;}

Below is another snippet of syntax highlighted C++ code:

// Create "window_count" Window objects:constautowindow_count=int{10};autowindows=std::array<std::shared_ptr<Window>,max_window_count>{};for(autoi=int{0};i<window_count;++i){windows[i]=std::make_shared<Window>();}

In the C++ example, the editor has recognized the keywords const, auto, int, and for. The comment at the beginning is also highlighted in a specific manner to distinguish it from working code.

History and limitations

The ideas of syntax highlighting overlap significantly with those of syntax-directed editors. One of the first such editors for code was Wilfred Hansen's 1969 code editor, Emily. [3] [4] It provided advanced language-independent code completion facilities, and unlike modern editors with syntax highlighting, actually made it impossible to create syntactically incorrect programs.

In 1982, Anita H. Klock and Jan B. Chodak filed a patent for the first known syntax highlighting system, [5] which was used in the Intellivision's Entertainment Computer System (ECS) peripheral, released in 1983. [6] It would highlight different elements of BASIC programs and was implemented in an attempt to make it easier for beginners, especially children, to start writing code. [7] Later, the Live Parsing Editor (LEXX) written for the VM operating system for the computerization of the Oxford English Dictionary in 1985 was one of the first to use color syntax highlighting. Its live parsing capability allowed user-supplied parsers to be added to the editor, for text, programs, data file, etc. [8] On microcomputers, MacPascal 1.0 (October 10, 1985) recognized Pascal syntax as it was typed and used font changes (e.g., bold for keywords) to highlight syntax on the monochrome compact Macintosh and automatically indented code to match its structure. [9]

Some text editors and code formatting tools perform syntax highlighting using pattern matching heuristics (e.g. Regular expressions) rather than implementing a parser for each possible language. [10] This can result in a text rendering system displaying somewhat inaccurate syntax highlighting and in some cases performing slowly. A solution used by text editors to overcome this problem is not always parsing the whole file but rather just the visible area, sometimes scanning backwards in the text up to a limited number of lines for "syncing".

On the other hand, the editor often displays code during its creation, while it is incomplete or incorrect, and the strict parsers (like ones used in compilers) would fail to parse the code most of the time.

Some modern, language-specific IDEs (in contrast to text editors) perform full language parsing which results in very accurate understanding of code. An extension of syntax highlighting was called "semantic highlighting" in 2009 by David Nolden [11] for the open-source C++ IDE KDevelop. For example, semantic highlighting may give local variables unique distinct colors to improve the comprehensibility of code. In 2014 the idea of colored local variables was further popularized due to a blog post by Evan Brooks, [12] and after that, the idea was transferred to other popular IDEs like Visual Studio, [13] Xcode, [14] and others.

Color in a user interface is less useful if the user has some degree of color blindness.

See also

Related Research Articles

<span class="mw-page-title-main">Text editor</span> Computer software used to edit plain text documents

A text editor is a type of computer program that edits plain text. An example of such program is "notepad" software. Text editors are provided with operating systems and software development packages, and can be used to change files such as configuration files, documentation files and programming language source code.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

<span class="mw-page-title-main">Abstract syntax tree</span> Tree representation of the abstract syntactic structure of source code

An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text written in a formal language. Each node of the tree denotes a construct occurring in the text. It is sometimes called just a syntax tree.

In computer science, a preprocessor is a program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.

<span class="mw-page-title-main">KDevelop</span> Integrated development environment

KDevelop is a free and open-source integrated development environment (IDE) for Unix-like computer operating systems and Windows. It provides editing, navigation and debugging features for several programming languages, and integration with build automation and version-control systems, using a plugin-based architecture.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

Pretty-printing is the application of any of various stylistic formatting conventions to text files, such as source code, markup, and similar kinds of content. These formatting conventions may entail adhering to an indentation style, using different color and typeface to highlight syntactic elements of source code, or adjusting size, to make the content easier for people to read, and understand. Pretty-printers for source code are sometimes called code formatters or beautifiers.

<span class="mw-page-title-main">Apache Groovy</span> Programming language

Apache Groovy is a Java-syntax-compatible object-oriented programming language for the Java platform. It is both a static and dynamic language with features similar to those of Python, Ruby, and Smalltalk. It can be used as both a programming language and a scripting language for the Java Platform, is compiled to Java virtual machine (JVM) bytecode, and interoperates seamlessly with other Java code and libraries. Groovy uses a curly-bracket syntax similar to Java's. Groovy supports closures, multiline strings, and expressions embedded in strings. Much of Groovy's power lies in its AST transformations, triggered through annotations.

<span class="mw-page-title-main">Code folding</span> Tool of editors for programming, scripting and markup

Code or text folding, or less commonly holophrasting, is a feature of some graphical user interfaces that allows the user to selectively hide ("fold") or display ("unfold") parts of a document. This allows the user to manage large amounts of text while viewing only those subsections that are currently of interest. It is typically used with documents which have a natural tree structure consisting of nested elements. Other names for these features include expand and collapse, code hiding, and outlining. In Microsoft Word, the feature is called "collapsible outlining".

Extensible Application Markup Language is a declarative XML-based language developed by Microsoft for initializing structured values and objects. It is available under Microsoft's Open Specification Promise.

In web development, "tag soup" is a pejorative for HTML written for a web page that is syntactically or structurally incorrect. Web browsers have historically treated structural or syntax errors in HTML leniently, so there has been little pressure for web developers to follow published standards. Therefore there is a need for all browser implementations to provide mechanisms to cope with the appearance of "tag soup", accepting and correcting for invalid syntax and structure where possible.

A lightweight markup language (LML), also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightweight markup languages are used in applications where it may be necessary to read the raw document as well as the final rendered output.

Plain Old Documentation (pod) is a lightweight markup language used to document the Perl programming language as well as Perl modules and programs.

vile is a text editor that combines aspects of the Emacs and vi editors. These editors are traditionally located on opposing sides of the editor wars, as users of either tend to have strong sentiments against the editor they do not use. vile attempts to reconcile these positions.

<span class="mw-page-title-main">Source-code editor</span> Text editor specializing in software code

A source-code editor is a text editor program designed specifically for editing source code of computer programs. It may be a standalone application or it may be built into an integrated development environment (IDE).

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

<span class="mw-page-title-main">Snippet (programming)</span> Small region of re-usable source code, machine code, or text

Snippet is a programming term for a small region of re-usable source code, machine code, or text. Ordinarily, these are formally defined operative units to incorporate into larger programming modules. Snippet management is a feature of some text editors, program source code editors, IDEs, and related software. It allows the user to avoid repetitive typing in the course of routine edit operations.

<span class="mw-page-title-main">Comment (computer programming)</span> Explanatory note in the source code of a computer program

In computer programming, a comment is a human-readable explanation or annotation in the source code of a computer program. They are added with the purpose of making the source code easier for humans to understand, and are generally ignored by compilers and interpreters. The syntax of comments in various programming languages varies considerably.

A structure editor, also structured editor or projectional editor, is any document editor that is cognizant of the document's underlying structure. Structure editors can be used to edit hierarchical or marked up text, computer programs, diagrams, chemical formulas, and any other type of content with clear and well-defined structure. In contrast, a text editor is any document editor used for editing plain text files.

The Language Server Protocol (LSP) is an open, JSON-RPC-based protocol for use between source code editors or integrated development environments (IDEs) and servers that provide "language intelligence tools": programming language-specific features like code completion, syntax highlighting and marking of warnings and errors, as well as refactoring routines. The goal of the protocol is to allow programming language support to be implemented and distributed independently of any given editor or IDE. In the early 2020s, LSP quickly became a "norm" for language intelligence tools providers.

References

  1. Jim D'Anjou; Sherry Shavor; Scott Fairbrother; Dan Kehn; John Kellerman; Pat McCarthy (2005). The Java developer's guide to Eclipse (2nd ed.). Boston: Addison-Wesley. ISBN   978-0-321-30502-2.
  2. Sarkar, Advait (2015). "The impact of syntax colouring on program comprehension". Proceedings of the 26th Annual Conference of the Psychology of Programming Interest Group: 49–58. Archived from the original on 24 September 2015. Retrieved 27 September 2023.
  3. Hansen, Wilfred J. (1971). "User engineering principles for interactive systems". Proceedings of the Fall Joint Computer Conference FJCC 39. AFIPS. pp. 5623–532.
  4. Hansen, Wilfred. "Emily - An Editor for Structured Text" . Retrieved 17 June 2013.
  5. Syntax error correction method and apparatus, 1982-10-29, retrieved 2018-04-12
  6. Mattel Intellivision: Intellivision Computer Module Owner's Guide (1983)(Mattel)(US). 1983.
  7. "Intellivision Classic Video Game System / Entertainment Computer System". www.intellivisionlives.com. Archived from the original on 2018-07-17. Retrieved 2018-04-12.
  8. Cowlishaw, M. F. (1987). "LEXX – A programmable structured editor" (PDF). IBM Journal of Research and Development, Vol 31, No. 1, IBM Reprint order number G322-0151. IBM.
  9. Allen, Dan (2011-10-10). "A Trio of Historical Recollections". mpw-dev (Mailing list). Archived from the original on 2017-08-21. Retrieved 12 September 2019.
  10. "KEDIT Language Definition Files". Kedit . Mansfield Software Group, Inc. 2012. Retrieved 2016-04-07.
  11. "2009 blog post on Semantic Highlighting introduced in KDevelop by David Nolden". 8 January 2009.
  12. "2014 blog post on Semantic Highlighting by Evan Brooks". 17 April 2017.
  13. "Visual Studio Magazine article on semantic highlighting".
  14. "Github page of a plugin which implements semantic highlighting for Xcode". GitHub . 14 September 2022.