Programming style

Last updated

Programming style, also known as coding style or code style, is a set of rules or guidelines that governs the layout of source code. Programming style may also refer an quality aspect of code that is interpreted subjectively.

Contents

Some claim that following a particular programming style helps programmers read and understand code and to avoid introducing errors.

The Elements of Programming Style , written in the 1970s provides examples in Fortran and PL/I.

The style used in a particular codebase is often based on the coding conventions of a company or organization, or the preferences of the programmer.

A style is often designed for a specific programming language or language family. For example, a style used for C may not be appropriate for BASIC. However, some rules are applied to many languages.

Automation

Tools are available that format source code, leaving coders to concentrate other aspects such as logic and naming. Using such tools can save development time and achieve a high level of consistency.

Style aspects

Aspects of code style include but are limited to:

Indentation

Indentation style can assist a reader in various way including: identifying control flow and blocks of code. In some programming languages, indentation is used to delimit blocks of code and therefore is not matter of style. In languages that ignore white space, indentation can affect readability.

For example, formatted in a commonly-used style:

if(hours<24&&minutes<60&&seconds<60){returntrue;}else{returnfalse;}

Arguably, poorly formatted:

if(hours<24&&minutes<60&&seconds<60){returntrue;}else{returnfalse;}

Notable indentation styles

ModuLiq

The ModuLiq Zero Indentation Style groups by empty line rather than indentation.

Example:

if(hours<24&&minutes<60&&seconds<60)returntrue;elsereturnfalse;
Lua

Lua does not use the traditional curly braces or parentheses; rather, the expression in a conditional statement must be followed by then, and the block must be closed with end.

ifhours<24andminutes<60andseconds<60thenreturntrueelsereturnfalseend

Indentation is optional in Lua. and, or, and not function as logical operators.

Python

Python relies on indentation to indicate control structure, thus eliminating the need for bracketing (i.e. { and }). On the other hand, copying and pasting Python code can lead to problems, because the indentation level of the pasted code may not be the same as the indentation level of the current line. Such reformatting can be tedious to do by hand, but some text editors and IDEs have features to do it automatically. There are also problems when Python code being rendered unusable when posted on a forum or web page that removes white space, though this problem can be avoided where it is possible to enclose code in white space-preserving tags such as "<pre> ... </pre>" (for HTML), "[code]" ... "[/code]" (for bbcode), etc.

ifhours<24andminutes<60andseconds<60:returnTrueelse:returnFalse

Notice that Python starts a block with a colon (:).

Python programmers tend to follow a commonly agreed style guide known as PEP8. [1] There are tools designed to automate PEP8 compliance.

Haskell

Like Python, Haskell has the off-side rule. It has a two-dimension syntax where indentation is meaningful to define blocks (although, an alternate syntax uses curly braces and semicolons).

Haskell is a declarative language, there are statements, but declarations within a Haskell script.

Example:

letc_1=1c_2=2infxy=c_1*x+c_2*y

may be written in one line as:

let{c_1=1;c_2=2}infxy=c_1*x+c_2*y

Haskell encourages the use of literate programming, where extended text explains the genesis of the code. In literate Haskell scripts (named with the lhs extension), everything is a comment except blocks marked as code. The program can be written in LaTeX, in such case the code environment marks what is code. Also, each active code paragraph can be marked by preceding and ending it with an empty line, and starting each line of code with a greater than sign and a space. Here an example using LaTeX markup:

Thefunction\verb+isValidDate+testifdateisvalid\begin{code}isValidDate::Date->BoolisValidDatedate=hh>=0&&mm>=0&&ss>=0&&hh<24&&mm<60&&ss<60where(hh,mm,ss)=fromDatedate\end{code}observethatinthiscasetheoverloadedfunctionis\verb+fromDate::Date->(Int,Int,Int)+.

And an example using plain text:

ThefunctionisValidDatetestifdateisvalid>isValidDate::Date->Bool>isValidDatedate=hh>=0&&mm>=0&&ss>=0>&&hh<24&&mm<60&&ss<60>where(hh,mm,ss)=fromDatedateobservethatinthiscasetheoverloadedfunctionisfromDate::Date->(Int,Int,Int).

Vertical alignment

Some programmers consider it valuable to align similar elements vertically (as tabular, in columns), citing that it can make typo-generated bugs more obvious.

For example, unaligned:

$search=array('a','b','c','d','e');$replacement=array('foo','bar','baz','quux');$value=0;$anothervalue=1;$yetanothervalue=2;

aligned:

$search=array('a','b','c','d','e');$replacement=array('foo','bar','baz','quux');$value=0;$anothervalue=1;$yetanothervalue=2;

Unlike the unaligned code, the aligned code implies that the search and replace values are related since they have corresponding elements. As there is one more value for search than replacement, if this is a bug, it is more likely to be spotted via visual inspection.

Cited disadvantages of vertical alignment include:

Maintaining alignment can be alleviated by a tool that provides support (i.e. for elastic tabstops), although that creates a reliance on such tools.

As an example, simple refactoring operations to rename "$replacement" to "$r" and "$anothervalue" to "$a" results in:

$search=array('a','b','c','d','e');$r=array('foo','bar','baz','quux');$value=0;$a=1;$yetanothervalue=2;

With unaligned formatting, these changes do not have such a dramatic, inconsistent or undesirable effect:

$search=array('a','b','c','d','e');$r=array('foo','bar','baz','quux');$value=0;$a=1;$yetanothervalue=2;

White space

A free-format language ignores white space: spaces, tabs and new lines so the programmer is free to style the code in different ways without affecting the meaning of the code. Generally, the programmer uses style that is considered to enhance readability.

The following two code snippets are the same logically, but differ in white space.

inti;for(i=0;i<10;++i){printf("%d",i*i+i);}

versus

inti;for(i=0;i<10;++i){printf("%d",i*i+i);}

The use of tabs for white space is debatable. Alignment issues arise due to differing tab stops in different environments and mixed use of tabs and spaces.

As an example, one programmer prefers tab stops of four and has their toolset configured this way, and uses these to format their code.

intix;// Index to scan arraylongsum;// Accumulator for sum

Another programmer prefers tab stops of eight, and their toolset is configured this way. When someone else examines the original person's code, they may well find it difficult to read.

intix;// Index to scan arraylongsum;// Accumulator for sum

One widely used solution to this issue may involve forbidding the use of tabs for alignment or rules on how tab stops must be set. Note that tabs work fine provided they are used consistently, restricted to logical indentation, and not used for alignment:

classMyClass{intfoobar(intqux,// first parameterintquux);// second parameterintfoobar2(intqux,// first parameterintquux,// second parameterintquuux);// third parameter};

See also

Related Research Articles

<span class="mw-page-title-main">Quine (computing)</span> Self-replicating program

A quine is a computer program that takes no input and produces a copy of its own source code as its only output. The standard terms for these programs in the computability theory and computer science literature are "self-replicating programs", "self-reproducing programs", and "self-copying programs".

Generic programming is a style of computer programming in which algorithms are written in terms of data types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered by the ML programming language in 1973, permits writing common functions or types that differ only in the set of types on which they operate when used, thus reducing duplicate code.

A string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo", where "foo" is a string literal with value foo. Methods such as escape sequences can be used to avoid the problem of delimiter collision and allow the delimiters to be embedded in a string. There are many alternate notations for specifying string literals especially in complicated cases. The exact notation depends on the programming language in question. Nevertheless, there are general guidelines that most modern programming languages follow.

In computer programming, indentation style is a convention, a.k.a. style, governing the indentation of blocks of source code that is intended to result in code that conveys structure.

YAML(see § History and name) is a human-readable data serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax that intentionally differs from Standard Generalized Markup Language (SGML). It uses Python-style indentation to indicate nesting and does not require quotes around most string values.

Pretty-printing is the application of any of various stylistic formatting conventions to text files, such as source code, markup, and similar kinds of content. These formatting conventions may entail adhering to an indentation style, using different color and typeface to highlight syntactic elements of source code, or adjusting size, to make the content easier for people to read, and understand. Pretty-printers for source code are sometimes called code formatters or beautifiers.

In computer programming, an assignment statement sets and/or re-sets the value stored in the storage location(s) denoted by a variable name; in other words, it copies a value into the variable. In most imperative programming languages, the assignment statement is a fundamental construct.

The off-side rule describes syntax of a computer programming language that defines the bounds of a code block via indentation.

In the written form of many languages, indentation describes empty space, a.k.a. white space, used around text to signify an important aspect of the text such as:

In computer science, a programming language is said to have first-class functions if it treats functions as first-class citizens. This means the language supports passing functions as arguments to other functions, returning them as the values from other functions, and assigning them to variables or storing them in data structures. Some programming language theorists require support for anonymous functions as well. In languages with first-class functions, the names of functions do not have any special status; they are treated like ordinary variables with a function type. The term was coined by Christopher Strachey in the context of "functions as first-class citizens" in the mid-1960s.

In computer programming, an entry point is the place in a program where the execution of a program begins, and where the program has access to command line arguments.

In mathematics and in computer programming, a variadic function is a function of indefinite arity, i.e., one which accepts a variable number of arguments. Support for variadic functions differs widely among programming languages.

In computer science, function composition is an act or mechanism to combine simple functions to build more complicated ones. Like the usual composition of functions in mathematics, the result of each function is passed as the argument of the next, and the result of the last one is the result of the whole.

Data structure alignment is the way data is arranged and accessed in computer memory. It consists of three separate but related issues: data alignment, data structure padding, and packing.

typedef is a reserved keyword in the programming languages C, C++, and Objective-C. It is used to create an additional name (alias) for another data type, but does not create a new type, except in the obscure case of a qualified typedef of an array type where the typedef qualifiers are transferred to the array element type. As such, it is often used to simplify the syntax of declaring complex data structures consisting of struct and union types, although it is also commonly used to provide specific descriptive type names for integer data types of varying sizes.

In computer science, garbage includes data, objects, or other regions of the memory of a computer system, which will not be used in any future computation by the system, or by a program running on it. Because every computer system has a finite amount of memory, and most software produces garbage, it is frequently necessary to deallocate memory that is occupied by garbage and return it to the heap, or memory pool, for reuse.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

Tacit programming, also called point-free style, is a programming paradigm in which function definitions do not identify the arguments on which they operate. Instead the definitions merely compose other functions, among which are combinators that manipulate the arguments. Tacit programming is of theoretical interest, because the strict use of composition results in programs that are well adapted for equational reasoning. It is also the natural style of certain programming languages, including APL and its derivatives, and concatenative languages such as Forth. The lack of argument naming gives point-free style a reputation of being unnecessarily obscure, hence the epithet "pointless style".

<span class="mw-page-title-main">Python syntax and semantics</span> Set of rules defining correctly structured programs

The syntax of the Python programming language is the set of rules that defines how a Python program will be written and interpreted. The Python language has many similarities to Perl, C, and Java. However, there are some definite differences between the languages. It supports multiple programming paradigms, including structured, object-oriented programming, and functional programming, and boasts a dynamic type system and automatic memory management.

<span class="mw-page-title-main">Comment (computer programming)</span> Explanatory note in the source code of a computer program

In computer programming, a comment is a programmer-readable explanation or annotation in the source code of a computer program. They are added with the purpose of making the source code easier for humans to understand, and are generally ignored by compilers and interpreters. The syntax of comments in various programming languages varies considerably.

References

  1. "PEP 0008 -- Style Guide for Python Code". python.org.