Programming style

Last updated

Programming style, also known as coding style, is the manner in which source code is written that results in distinctive characteristics of the code; the resulting code style.

Contents

Many consider consistent style within a codebase to be valuable; to make the code easier to read and more maintainable. Often, a programmer follows style guidelines with the intent of producing code that follows a consistent style not only in their body of work but in the work of all those using the same guidelines.

Style guidelines come in many forms. They may be described in coding conventions in a standard or programmers may adhere to common practices and defacto standards. Guidelines may be descriptive such as write clearly – don't be too clever or prescriptive such as indentation is 4 spaces. Various books such as The Elements of Programming Style provide code style examples.

As with style in general, code style can be described as many separate aspects such as indentation, naming, and capitalization.

Aspects of code style are often used in the context of a specific programming language or language family. A style aspect used for C may not be appropriate for BASIC. However, some style aspects apply to many languages.

Automation

In some cases, adherence to style can be achieved via software tools. A tool that formats code allows coders to focus on other aspects such as logic and naming. Using such tools can result in a more consistent code style with less effort from the coders.

Style aspects

Aspects of code style include but are limited to:

Indentation

Indentation style can assist a reader in various way including: identifying control flow and blocks of code. In some programming languages, indentation is used to delimit blocks of code and therefore is not matter of style. In languages that ignore white space, indentation can affect readability.

For example, formatted in a commonly-used style:

if(hours<24&&minutes<60&&seconds<60){returntrue;}else{returnfalse;}

Arguably, poorly formatted:

if(hours<24&&minutes<60&&seconds<60){returntrue;}else{returnfalse;}

Notable indentation styles

ModuLiq

The ModuLiq Zero Indentation Style groups by empty line rather than indentation.

Example:

if(hours<24&&minutes<60&&seconds<60)returntrue;elsereturnfalse;
Lua

Lua does not use the traditional curly braces or parentheses; rather, the expression in a conditional statement must be followed by then, and the block must be closed with end.

ifhours<24andminutes<60andseconds<60thenreturntrueelsereturnfalseend

Indentation is optional in Lua. and, or, and not function as logical operators.

Python

Python relies on indentation to indicate control structure, thus eliminating the need for bracketing (i.e. { and }). On the other hand, copying and pasting Python code can lead to problems, because the indentation level of the pasted code may not be the same as the indentation level of the current line. Such reformatting can be tedious to do by hand, but some text editors and IDEs have features to do it automatically. There are also problems when Python code being rendered unusable when posted on a forum or web page that removes white space, though this problem can be avoided where it is possible to enclose code in white space-preserving tags such as "<pre> ... </pre>" (for HTML), "[code]" ... "[/code]" (for bbcode), etc.

ifhours<24andminutes<60andseconds<60:returnTrueelse:returnFalse

Notice that Python starts a block with a colon (:).

Python programmers tend to follow a commonly agreed style guide known as PEP8. [1] There are tools designed to automate PEP8 compliance.

Haskell

Like Python, Haskell has the off-side rule. It has a two-dimension syntax where indentation is meaningful to define blocks (although, an alternate syntax uses curly braces and semicolons).

Haskell is a declarative language, there are statements, but declarations within a Haskell script.

Example:

letc_1=1c_2=2infxy=c_1*x+c_2*y

may be written in one line as:

let{c_1=1;c_2=2}infxy=c_1*x+c_2*y

Haskell encourages the use of literate programming, where extended text explains the genesis of the code. In literate Haskell scripts (named with the lhs extension), everything is a comment except blocks marked as code. The program can be written in LaTeX, in such case the code environment marks what is code. Also, each active code paragraph can be marked by preceding and ending it with an empty line, and starting each line of code with a greater than sign and a space. Here an example using LaTeX markup:

Thefunction\verb+isValidDate+testifdateisvalid\begin{code}isValidDate::Date->BoolisValidDatedate=hh>=0&&mm>=0&&ss>=0&&hh<24&&mm<60&&ss<60where(hh,mm,ss)=fromDatedate\end{code}observethatinthiscasetheoverloadedfunctionis\verb+fromDate::Date->(Int,Int,Int)+.

And an example using plain text:

ThefunctionisValidDatetestifdateisvalid>isValidDate::Date->Bool>isValidDatedate=hh>=0&&mm>=0&&ss>=0>&&hh<24&&mm<60&&ss<60>where(hh,mm,ss)=fromDatedateobservethatinthiscasetheoverloadedfunctionisfromDate::Date->(Int,Int,Int).

Vertical alignment

Some programmers consider it valuable to align similar elements vertically (as tabular, in columns), citing that it can make typo-generated bugs more obvious.

For example, unaligned:

$search=array('a','b','c','d','e');$replacement=array('foo','bar','baz','quux');$value=0;$anothervalue=1;$yetanothervalue=2;

aligned:

$search=array('a','b','c','d','e');$replacement=array('foo','bar','baz','quux');$value=0;$anothervalue=1;$yetanothervalue=2;

Unlike the unaligned code, the aligned code implies that the search and replace values are related since they have corresponding elements. As there is one more value for search than replacement, if this is a bug, it is more likely to be spotted via visual inspection.

Cited disadvantages of vertical alignment include:

Maintaining alignment can be alleviated by a tool that provides support (i.e. for elastic tabstops), although that creates a reliance on such tools.

As an example, simple refactoring operations to rename "$replacement" to "$r" and "$anothervalue" to "$a" results in:

$search=array('a','b','c','d','e');$r=array('foo','bar','baz','quux');$value=0;$a=1;$yetanothervalue=2;

With unaligned formatting, these changes do not have such a dramatic, inconsistent or undesirable effect:

$search=array('a','b','c','d','e');$r=array('foo','bar','baz','quux');$value=0;$a=1;$yetanothervalue=2;

White space

A free-format language ignores white space: spaces, tabs and new lines so the programmer is free to style the code in different ways without affecting the meaning of the code. Generally, the programmer uses style that is considered to enhance readability.

The following two code snippets are the same logically, but differ in white space.

inti;for(i=0;i<10;++i){printf("%d",i*i+i);}

versus

inti;for(i=0;i<10;++i){printf("%d",i*i+i);}

The use of tabs for white space is debatable. Alignment issues arise due to differing tab stops in different environments and mixed use of tabs and spaces.

As an example, one programmer prefers tab stops of four and has their toolset configured this way, and uses these to format their code.

intix;// Index to scan arraylongsum;// Accumulator for sum

Another programmer prefers tab stops of eight, and their toolset is configured this way. When someone else examines the original person's code, they may well find it difficult to read.

intix;// Index to scan arraylongsum;// Accumulator for sum

One widely used solution to this issue may involve forbidding the use of tabs for alignment or rules on how tab stops must be set. Note that tabs work fine provided they are used consistently, restricted to logical indentation, and not used for alignment:

classMyClass{intfoobar(intqux,// first parameterintquux);// second parameterintfoobar2(intqux,// first parameterintquux,// second parameterintquuux);// third parameter};

See also

Related Research Articles

In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed and which also avoids repeated evaluations.

<span class="mw-page-title-main">Quine (computing)</span> Self-replicating program

A quine is a computer program that takes no input and produces a copy of its own source code as its only output. The standard terms for these programs in the computability theory and computer science literature are "self-replicating programs", "self-reproducing programs", and "self-copying programs".

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every term. Usually the terms are various language constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term.

A string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo", where "foo" is a string literal with value foo. Methods such as escape sequences can be used to avoid the problem of delimiter collision and allow the delimiters to be embedded in a string. There are many alternate notations for specifying string literals especially in complicated cases. The exact notation depends on the programming language in question. Nevertheless, there are general guidelines that most modern programming languages follow.

In computer programming, indentation style is a convention, a.k.a. style, governing the indentation of blocks of source code. An indentation style generally involves consistent width of whitespace before each line of a block, so that the lines of code appear to be related, and dictates whether to use space or tab characters for the indentation whitespace.

YAML is a human-readable data serialization language. It is commonly used for configuration files and in applications where data are being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax that intentionally differs from Standard Generalized Markup Language (SGML). It uses Python-style indentation to indicate nesting and does not require quotes around most string values.

In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled programs must use. Most processors support a similar set of primitive data types, although the specific representations vary. More generally, "primitive data types" may refer to the standard data types built into a programming language. Data types which are not primitive are referred to as derived or composite.

Pretty-printing is the application of any of various stylistic formatting conventions to text files, such as source code, markup, and similar kinds of content. These formatting conventions may entail adhering to an indentation style, using different color and typeface to highlight syntactic elements of source code, or adjusting size, to make the content easier for people to read, and understand. Pretty-printers for source code are sometimes called code formatters or beautifiers.

In computer programming, an assignment statement sets and/or re-sets the value stored in the storage location(s) denoted by a variable name; in other words, it copies a value into the variable. In most imperative programming languages, the assignment statement is a fundamental construct.

In computer science, a union is a value that may have any of several representations or formats within the same position in memory; that consists of a variable that may hold such a data structure. Some programming languages support special data types, called union types, to describe such values and variables. In other words, a union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g., "float or long integer". In contrast with a record, which could be defined to contain both a float and an integer; in a union, there is only one value at any given time.

The off-side rule describes syntax of a computer programming language that defines the bounds of a code block via indentation.

In the written form of many languages, indentation describes empty space, a.k.a. white space, used around text to signify an important aspect of the text such as:

In computer science, a programming language is said to have first-class functions if it treats functions as first-class citizens. This means the language supports passing functions as arguments to other functions, returning them as the values from other functions, and assigning them to variables or storing them in data structures. Some programming language theorists require support for anonymous functions as well. In languages with first-class functions, the names of functions do not have any special status; they are treated like ordinary variables with a function type. The term was coined by Christopher Strachey in the context of "functions as first-class citizens" in the mid-1960s.

In computer programming, an entry point is the place in a program where the execution of a program begins, and where the program has access to command line arguments.

In computer science, function composition is an act or mechanism to combine simple functions to build more complicated ones. Like the usual composition of functions in mathematics, the result of each function is passed as the argument of the next, and the result of the last one is the result of the whole.

Data structure alignment is the way data is arranged and accessed in computer memory. It consists of three separate but related issues: data alignment, data structure padding, and packing.

Tacit programming, also called point-free style, is a programming paradigm in which function definitions do not identify the arguments on which they operate. Instead the definitions merely compose other functions, among which are combinators that manipulate the arguments. Tacit programming is of theoretical interest, because the strict use of composition results in programs that are well adapted for equational reasoning. It is also the natural style of certain programming languages, including APL and its derivatives, and concatenative languages such as Forth. The lack of argument naming gives point-free style a reputation of being unnecessarily obscure, hence the epithet "pointless style".

<span class="mw-page-title-main">Python syntax and semantics</span> Set of rules defining correctly structured programs

The syntax of the Python programming language is the set of rules that defines how a Python program will be written and interpreted. The Python language has many similarities to Perl, C, and Java. However, there are some definite differences between the languages. It supports multiple programming paradigms, including structured, object-oriented programming, and functional programming, and boasts a dynamic type system and automatic memory management.

This comparison of programming languages compares the features of language syntax (format) for over 50 computer programming languages.

<span class="mw-page-title-main">Comment (computer programming)</span> Explanatory note in the source code of a computer program

In computer programming, a comment is a programmer-readable explanation or annotation in the source code of a computer program. They are added with the purpose of making the source code easier for humans to understand, and are generally ignored by compilers and interpreters. The syntax of comments in various programming languages varies considerably.

References

  1. "PEP 0008 -- Style Guide for Python Code". python.org.