Pseudocode

Last updated

In computer science, pseudocode is a description of the steps in an algorithm using a mix of conventions of programming languages (like assignment operator, conditional operator, loop) with informal, usually self-explanatory, notation of actions and conditions. [1] [2] Although pseudocode shares features with regular programming languages, it is intended for human reading rather than machine control. Pseudocode typically omits details that are essential for machine implementation of the algorithm, meaning that pseudocode can only be verified by hand. [3] The programming language is augmented with natural language description details, where convenient, or with compact mathematical notation. The purpose of using pseudocode is that it is easier for people to understand than conventional programming language code, and that it is an efficient and environment-independent description of the key principles of an algorithm. It is commonly used in textbooks and scientific publications to document algorithms and in planning of software and other algorithms.

Contents

No broad standard for pseudocode syntax exists, as a program in pseudocode is not an executable program; however, certain limited standards exist (such as for academic assessment). Pseudocode resembles skeleton programs, which can be compiled without errors. Flowcharts, drakon-charts and Unified Modelling Language (UML) charts can be thought of as a graphical alternative to pseudocode, but need more space on paper. Languages such as HAGGIS bridge the gap between pseudocode and code written in programming languages.

Application

Pseudocode is commonly used in textbooks and scientific publications related to computer science and numerical computation to describe algorithms in a way that is accessible to programmers regardless of their familiarity with specific programming languages. Textbooks often include an introduction explaining the conventions in use, and the detail of pseudocode may sometimes approach that of formal programming languages.

Programmers frequently begin implementing an unfamiliar algorithm by drafting it in pseudocode, then translating it into a programming language while adapting it to fit the larger program. This top-down structuring approach often starts with a pseudocode sketch refined into executable code. Pseudocode is also used in standardization; for example, the MPEG standards rely on formal C-like pseudocode, these standards cannot be understood without grasping the details of the code. [4]

Syntax

Pseudocode generally does not actually obey the syntax rules of any particular language; there is no systematic standard form. Some writers borrow style and syntax from control structures from some conventional programming language, although this is discouraged. [5] [6] Some syntax sources include Fortran, Pascal, BASIC, C, C++, Java, Lisp, and ALGOL. Variable declarations are typically omitted. Function calls and blocks of code, such as code contained within a loop, are often replaced by a one-line natural language sentence.

Depending on the writer, pseudocode may therefore vary widely in style, from a near-exact imitation of a real programming language at one extreme, to a description approaching formatted prose at the other.

This flexibility brings both major advantages and drawbacks: on the positive side, no executable programming language "can beat the convenience of inventing new constructs as needed and letting the reader try to deduce their meaning from informal explanations", on the negative, "untested code is usually incorrect". [7]

An example of pseudocode (for the mathematical game fizz buzz)

Pascal style:

procedurefizzbuzz;fori:=1to100doprint_number:=true;ifiisdivisibleby3thenbeginprint"Fizz";print_number:=false;end;ifiisdivisibleby5thenbeginprint"Buzz";print_number:=false;end;ifprint_number,printi;printanewline;end

C style:

fizzbuzz(){for(i=1;i<=100;i++){print_number=true;if(iisdivisibleby3){print"Fizz";print_number=false;}if(iisdivisibleby5){print"Buzz";print_number=false;}if(print_number)printi;printanewline;}}

Python style:

deffizzbuzz():foriinrange(1,101):print_number=trueifiisdivisibleby3:print"Fizz"print_number=falseifiisdivisibleby5:print"Buzz"print_number=falseifprint_number:printiprintanewline

Mathematical style pseudocode

In numerical computation, pseudocode often consists of mathematical notation, typically from matrix and set theory, mixed with the control structures of a conventional programming language, and perhaps also natural language descriptions. This is a compact and often informal notation that can be understood by a wide range of mathematically trained people, and is frequently used as a way to describe mathematical algorithms. For example, the sum operator (capital-sigma notation) or the product operator (capital-pi notation) may represent a for-loop and a selection structure in one expression:

Return 

Normally non-ASCII typesetting is used for the mathematical equations, for example by means of markup languages, such as TeX or MathML, or proprietary formula editors.

Mathematical style pseudocode is sometimes referred to as pidgin code, for example pidgin ALGOL (the origin of the concept), pidgin Fortran , pidgin BASIC , pidgin Pascal , pidgin C , and pidgin Lisp .

Common mathematical symbols

Type of operationSymbolExample
Assignment← or :=c ← 2πr, c := 2πr
Comparison=, ≠, <, >, ≤, ≥
Arithmetic+, −, ×, /, mod
Floor/ceiling⌊, ⌋, ⌈, ⌉a ← ⌊b⌋ + ⌈c
Logicaland, or
Sums, productsΣ Πh ← ΣaA 1/a

Example

The following is a longer example of mathematical-style pseudocode, for the Ford–Fulkerson algorithm:

algorithm ford-fulkerson isinput: Graph G with flow capacity c,             source node s,             sink node toutput: Flow f such that f is maximal from s to t(Note that f(u,v) is the flow from node u to node v, and c(u,v) is the flow capacity from node u to node v)for each edge (u, v) inGEdof(u, v) ← 0         f(v, u) ← 0      while there exists a path p from s to tin the residual network Gfdo         let cf be the flow capacity of the residual network Gfcf(p) ← min{cf(u, v) | (u, v) inp}         for each edge (u, v) inpdof(u, v)f(u, v) + cf(p)             f(v, u) ← −f(u, v)returnf

Machine compilation of pseudocode style languages

Natural language grammar in programming languages

Various attempts to bring elements of natural language grammar into computer programming have produced programming languages such as HyperTalk, Lingo, AppleScript, SQL, Inform, and to some extent Python. In these languages, parentheses and other special characters are replaced by prepositions, resulting in quite verbose code. These languages are typically dynamically typed, meaning that variable declarations and other boilerplate code can be omitted. Such languages may make it easier for a person without knowledge about the language to understand the code and perhaps also to learn the language. However, the similarity to natural language is usually more cosmetic than genuine. The syntax rules may be just as strict and formal as in conventional programming, and do not necessarily make development of the programs easier.

Mathematical programming languages

An alternative to using mathematical pseudocode (involving set theory notation or matrix operations) for documentation of algorithms is to use a formal mathematical programming language that is a mix of non-ASCII mathematical notation and program control structures. Then the code can be parsed and interpreted by a machine.

Several formal specification languages include set theory notation using special characters. Examples are:

Some array programming languages include vectorized expressions and matrix operations as non-ASCII formulas, mixed with conventional control structures. Examples are:

See also

Related Research Articles

<span class="mw-page-title-main">Algorithm</span> Sequence of operations for a task

In mathematics and computer science, an algorithm is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing calculations and data processing. More advanced algorithms can use conditionals to divert the code execution through various routes and deduce valid inferences.

<span class="mw-page-title-main">ALGOL</span> Family of programming languages

ALGOL is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by the Association for Computing Machinery (ACM) in textbooks and academic sources for more than thirty years.

<span class="mw-page-title-main">Formal language</span> Sequence of words formed by specific rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules called a formal grammar.

<span class="mw-page-title-main">Regular expression</span> Sequence of characters that forms a search pattern

A regular expression, sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.

<span class="mw-page-title-main">String (computer science)</span> Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

A bracket is either of two tall fore- or back-facing punctuation marks commonly used to isolate a segment of text or data from its surroundings. They come in four main pairs of shapes, as given in the box to the right, which also gives their names, that vary between British and American English. "Brackets", without further qualification, are in British English the (...) marks and in American English the [...] marks.

In computer science, Backus–Naur form is a notation used to describe the syntax of programming languages or other formal languages. It was developed by John Backus and Peter Naur. BNF can be described as a metasyntax notation for context-free grammars. Backus–Naur form is applied wherever exact descriptions of languages are needed, such as in official language specifications, in manuals, and in textbooks on programming language theory. BNF can be used to describe document formats, instruction sets, and communication protocols.

Iteration is the repetition of a process in order to generate a sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration.

In computer science, extended Backus–Naur form (EBNF) is a family of metasyntax notations, any of which can be used to express a context-free grammar. EBNF is used to make a formal description of a formal language such as a computer programming language. They are extensions of the basic Backus–Naur form (BNF) metasyntax notation. The earliest EBNF was developed by Niklaus Wirth, incorporating some of the concepts from Wirth syntax notation. Today, many variants of EBNF are in use. The International Organization for Standardization adopted an EBNF Standard, ISO/IEC 14977, in 1996. According to Zaytsev, however, this standard "only ended up adding yet another three dialects to the chaos" and, after noting its lack of success, also notes that the ISO EBNF is not even used in all ISO standards. Wheeler argues against using the ISO standard when using an EBNF and recommends considering alternative EBNF notations such as the one from the W3C Extensible Markup Language (XML) 1.0 . This article uses EBNF as specified by the ISO for examples applying to all EBNFs. Other EBNF variants use somewhat different syntactic conventions.

A modeling language is any artificial language that can be used to express data, information or knowledge or systems in a structure that is defined by a consistent set of rules. The rules are used for interpretation of the meaning of components in the structure of a programming language.

<span class="mw-page-title-main">Flowchart</span> Diagram that represents a workflow or process

A flowchart is a type of diagram that represents a workflow or process. A flowchart can also be defined as a diagrammatic representation of an algorithm, a step-by-step approach to solving a task.

The vertical bar, |, is a glyph with various uses in mathematics, computing, and typography. It has many names, often related to particular meanings: Sheffer stroke, pipe, bar, or, vbar, and others.

In computer programming, pidgin code is a mixture of several programming languages in the same program, or pseudocode that is a mixture of a programming language with natural language descriptions. Hence the name: the mixture is a programming language analogous to a pidgin in natural languages.

Fortress is a discontinued experimental programming language for high-performance computing, created by Sun Microsystems with funding from DARPA's High Productivity Computing Systems project. One of the language designers was Guy L. Steele Jr., whose previous work includes Scheme, Common Lisp, and Java.

Unified English Braille Code is an English language Braille code standard, developed to encompass the wide variety of literary and technical material in use in the English-speaking world today, in uniform fashion.

This is an alphabetical list of articles pertaining specifically to software engineering.

<span class="mw-page-title-main">Comment (computer programming)</span> Explanatory note in the source code of a computer program

In computer programming, a comment is a human-readable explanation or annotation in the source code of a computer program. They are added with the purpose of making the source code easier for humans to understand, and are generally ignored by compilers and interpreters. The syntax of comments in various programming languages varies considerably.

<span class="mw-page-title-main">DRAKON</span> Algorithm mapping tool

DRAKON is a free and open source algorithmic visual programming and modeling language developed as part of the defunct Soviet Union Buran space program in 1986 following the need in increase of software development productivity. The visual language provides a uniform way to represent processes in flowcharts.

A mathematical markup language is a computer notation for representing mathematical formulae, based on mathematical notation. Specialized markup languages are necessary because computers normally deal with linear text and more limited character sets. A formally standardized syntax also allows a computer to interpret otherwise ambiguous content, for rendering or even evaluating. For computer-interpretable syntaxes, the most popular are TeX/LaTeX, MathML, OpenMath and OMDoc.

Caret is the name used familiarly for the character ^ provided on most QWERTY keyboards by typing ⇧ Shift+6. The symbol has a variety of uses in programming and mathematics. The name "caret" arose from its visual similarity to the original proofreader's caret, , a mark used in proofreading to indicate where a punctuation mark, word, or phrase should be inserted into a document. The ASCII standard (X3.64.1977) calls it a "circumflex"; the Unicode standard calls it a "circumflex accent", although it is no longer practicable for that purpose.

References

  1. Reisig 2007, p. 23, Pseudocode Programs and Their Semantics.
  2. An often-repeated definition of pseudocode since at least 2003 is "a detailed yet readable description of what a computer program or algorithm must do, expressed in a formally-styled natural language"
  3. Ulate-Caballero, Bryan Alexander; Berrocal-Rojas, Allan; Hidalgo-Céspedes, Jeisson (2021). "Concurrent and Distributed Pseudocode: A Systematic Literature Review". 2021 XLVII Latin American Computing Conference (CLEI). pp. 1–10. doi:10.1109/CLEI53233.2021.9640222. ISBN   978-1-6654-9503-5.
  4. Mitchell et al. 1996, p. 105.
  5. McConnell, Steve (2004). Code Complete. Pearson Education. p. 54. ISBN   978-0-7356-1967-8. Avoid syntactic elements from the target programming language
  6. Invitation to Computer Science, 8th Edition by Schneider/Gersting, "Keep statements language independent" as quoted in this stackexchange question
  7. Lamport, Leslie (2 January 2009). "The PlusCal Algorithm Language" (PDF). Microsoft Research. Retrieved 28 May 2024.

Further reading