This article needs additional citations for verification .(December 2011) |
The off-side rule describes syntax of a computer programming language that defines the bounds of a code block via indentation. [1] [2]
The term was coined by Peter Landin, possibly as a pun on the offside law in association football.
An off-side rule language is contrasted with a free-form language in which indentation has no syntactic meaning, and indentation is strictly a matter of style.
An off-side rule language is also described as having significant indentation.
Peter Landin, in his 1966 article "The Next 700 Programming Languages", defined the off-side rule thus: "Any non-whitespace token to the left of the first such token on the previous line is taken to be the start of a new declaration." [3]
The following is an example of indentation blocks in Python; a popular off-side rule language. In Python, the rule is taken to define the boundaries of statements rather than declarations.
defis_even(a:int)->bool:ifa%2==0:print('Even!')returnTrueprint('Odd!')returnFalse
The body of the function starts on line 2 since it is indented one level (4 spaces) more than the previous line. The if
clause body starts on line 3 since it is indented an additional level, and ends on line 4 since line 5 is indented a level less, a.k.a. outdented.
The colon (:
) at the end of a control statement line is Python syntax; not an aspect of the off-side rule. The rule can be realized without such colon syntax.
The off-side rule can be implemented in the lexical analysis phase, as in Python, where increasing the indenting results in the lexer outputting an INDENT
token, and decreasing the indenting results in the lexer outputting a DEDENT
token. [4] These tokens correspond to the opening brace {
and closing brace }
in languages that use braces for blocks, and means that the phrase grammar does not depend on whether braces or indentation are used. This requires that the lexer hold state, namely the current indent level, and thus can detect changes in indentation when this changes, and thus the lexical grammar is not context-free: INDENT
and DEDENT
depend on the contextual information of the prior indent level.
The primary alternative to delimiting blocks by indenting, popularized by broad use and influence of the language C, is to ignore whitespace characters and mark blocks explicitly with curly brackets (i.e., {
and }
) or some other delimiter. While this allows for more formatting freedom – a developer might choose not to indent small pieces of code like the break and continue statements – sloppily indented code might lead the reader astray, such as the goto fail bug.
Lisp and other S-expression-based languages do not differentiate statements from expressions, and parentheses are enough to control the scoping of all statements within the language. As in curly bracket languages, whitespace is mostly ignored by the reader (i.e., the read function). Whitespace is used to separate tokens. [5] The explicit structure of Lisp code allows automatic indenting, to form a visual cue for human readers.
Another alternative is for each block to begin and end with explicit keywords. For example, in ALGOL 60 and its descendant Pascal, blocks start with keyword begin
and end with keyword end
. In some languages (but not Pascal), this means that newlines are important[ citation needed ] (unlike in curly brace languages), but the indentation is not. In BASIC and Fortran, blocks begin with the block name (such as IF
) and end with the block name prepended with END
(e.g., END IF
). In Fortran, each and every block can also have its own unique block name, which adds another level of explicitness to lengthy code. ALGOL 68 and the Bourne shell (sh, and bash) are similar, but the end of the block is usually given by the name of the block written backward (e.g., case
starts a switch statement and it spans until the matching esac
; similarly conditionals if
...then
...[elif
...[else
...]]fi
or for loops for
...do
...od
in ALGOL68 or for
...do
...done
in bash).
An interesting variant of this occurs in Modula-2, a Pascal-like language which does away with the difference between one and multiline blocks. This allows the block opener ({
or BEGIN
) to be skipped for all but the function level block, requiring only a block terminating token (}
or END
). It also fixes dangling else. Custom is for the end
token to be placed on the same indent level as the rest of the block, giving a blockstructure that is very readable.
One advantage to the Fortran approach is that it improves readability of long, nested, or otherwise complex code. A group of outdents or closing brackets alone provides no contextual cues as to which blocks are being closed, necessitating backtracking, and closer scrutiny while debugging. Further, languages that allow a suffix for END-like keywords further improve such cues, such as continue
versus continue for x
, and end-loop marker specifying the index variable NEXT I
versus NEXT
, and uniquely named loops CYCLE X1
versus CYCLE
. However, modern source code editors often provide visual indicators, such as syntax highlighting, and features such as code folding to assist with these drawbacks.
In the language Scala, early versions allowed curly braces only. Scala 3 added an option to use indenting to structure blocks. Designer Martin Odersky said that this was the single most important way Scala 3 improved his own productivity, that it makes programs over 10% shorter and keeps programmers "in the flow", and advises its use. [6]
Notable programming languages with the off-side rule:
#light
is specified; in later versions when #light "off"
is not [7] where
, let
, do
, or case ... of
clauses when braces are omittedNotable non-programming language, text file formats with significant indentation:
In computer science, syntactic sugar is syntax within a programming language that is designed to make things easier to read or to express. It makes the language "sweeter" for human use: things can be expressed more clearly, more concisely, or in an alternative style that some may prefer. Syntactic sugar is usually a shorthand for a common operation that could also be expressed in an alternate, more verbose, form: The programmer has a choice of whether to use the shorter form or the longer form, but will usually use the shorter form since it is shorter and easier to type and read.
In computer science, control flow is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an imperative programming language from a declarative programming language.
Lexical tokenization is conversion of a text into meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is related to the type of tokenization used in large language models (LLMs) but with two differences. First, lexical tokenization is usually based on a lexical grammar, whereas LLM tokenizers are usually probability-based. Second, LLM tokenizers perform a second step that converts the tokens into numerical values.
Programming style, also known as coding style, refers to the conventions and patterns used in writing source code, resulting in a consistent and readable codebase. These conventions often encompass aspects such as indentation, naming conventions, capitalization, and comments. Consistent programming style is generally considered beneficial for code readability and maintainability, particularly in collaborative environments.
A string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo"
, where, "foo"
is a string literal with value foo
. Methods such as escape sequences can be used to avoid the problem of delimiter collision and allow the delimiters to be embedded in a string. There are many alternate notations for specifying string literals especially in complicated cases. The exact notation depends on the programming language in question. Nevertheless, there are general guidelines that most modern programming languages follow.
In computer programming, indentation style is a convention, a.k.a. style, governing the indentation of blocks of source code. An indentation style generally involves consistent width of whitespace before each line of a block, so that the lines of code appear to be related, and dictates whether to use space or tab characters for the indentation whitespace.
Pretty-printing is the application of any of various stylistic formatting conventions to text files, such as source code, markup, and similar kinds of content. These formatting conventions may entail adhering to an indentation style, using different color and typeface to highlight syntactic elements of source code, or adjusting size, to make the content easier for people to read, and understand. Pretty-printers for source code are sometimes called code formatters or beautifiers.
In computer programming, a block or code block or block of code is a lexical structure of source code which is grouped together. Blocks consist of one or more declarations and statements. A programming language that permits the creation of blocks, including blocks nested within other blocks, is called a block-structured programming language. Blocks are fundamental to structured programming, where control structures are formed from blocks.
In computer programming, a free-form language is a programming language in which the positioning of characters on the page in program text is insignificant. Program text does not need to be placed in specific columns as on old punched card systems, and frequently ends of lines are insignificant. Whitespace characters are used only to delimit tokens, and have no other significance. Free-form languages allow a greater degree of flexibility and have fewer syntactic rules to learn, which could lower the entry barrier for beginners.
In computer science, a NOP, no-op, or NOOP is a machine language instruction and its assembly language mnemonic, programming language statement, or computer protocol command that does nothing.
In computer science, conditionals are programming language constructs that perform different computations or actions or return different values depending on the value of a Boolean expression, called a condition.
Code or text folding, or less commonly holophrasting, is a feature of some graphical user interfaces that allows the user to selectively hide ("fold") or display ("unfold") parts of a document. This allows the user to manage large amounts of text while viewing only those subsections that are currently of interest. It is typically used with documents which have a natural tree structure consisting of nested elements. Other names for these features include expand and collapse, code hiding, and outlining. In Microsoft Word, the feature is called "collapsible outlining".
In the written form of many languages, indentation describes empty space, a.k.a. white space, used around text to signify an important aspect of the text such as:
In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.
The syntax of the Python programming language is the set of rules that defines how a Python program will be written and interpreted. The Python language has many similarities to Perl, C, and Java. However, there are some definite differences between the languages. It supports multiple programming paradigms, including structured, object-oriented programming, and functional programming, and boasts a dynamic type system and automatic memory management.
This comparison of programming languages compares the features of language syntax (format) for over 50 computer programming languages.
Coding conventions are a set of guidelines for a specific programming language that recommend programming style, practices, and methods for each aspect of a program written in that language. These conventions usually cover file organization, indentation, comments, declarations, statements, white space, naming conventions, programming practices, programming principles, programming rules of thumb, architectural best practices, etc. These are guidelines for software structural quality. Software programmers are highly recommended to follow these guidelines to help improve the readability of their source code and make software maintenance easier. Coding conventions are only applicable to the human maintainers and peer reviewers of a software project. Conventions may be formalized in a documented set of rules that an entire team or company follows, or may be as informal as the habitual coding practices of an individual. Coding conventions are not enforced by compilers.
In computer programming, a comment is a human-readable explanation or annotation in the source code of a computer program. They are added with the purpose of making the source code easier for humans to understand, and are generally ignored by compilers and interpreters. The syntax of comments in various programming languages varies considerably.
CoffeeScript is a programming language that compiles to JavaScript. It adds syntactic sugar inspired by Ruby, Python, and Haskell in an effort to enhance JavaScript's brevity and readability. Specific additional features include list comprehension and destructuring assignment.