N-version programming

Last updated

N-version programming (NVP), also known as multiversion programming or multiple-version dissimilar software, is a method or process in software engineering where multiple functionally equivalent programs are independently generated from the same initial specifications. [1] The concept of N-version programming was introduced in 1977 by Liming Chen and Algirdas Avizienis with the central conjecture that the "independence of programming efforts will greatly reduce the probability of identical software faults occurring in two or more versions of the program". [1] [2] The aim of NVP is to improve the reliability of software operation by building in fault tolerance or redundancy. [1]

Contents

NVP approach

The general steps of N-version programming are:

  1. An initial specification of the intended functionality of the software is developed. The specification should unambiguously define: functions, data formats (which include comparison vectors, c-vectors, and comparison status indicators, cs-indicators), cross-check points (cc-points), comparison algorithm, and responses to the comparison algorithm. [1] [2]
  2. From the specifications, two or more versions of the program are independently developed, each by a group that does not interact with the others. [1] The implementations of these functionally equivalent programs use different algorithms and programming languages. [1] At various points of the program, special mechanisms are built into the software which allow the program to be governed by the N-version execution environment (NVX). [2] These special mechanisms include: comparison vectors (c-vectors, a data structure representing the program's state), comparison status indicators (cs-indicators), and synchronization mechanisms. [1] The resulting programs are called N-version software (NVS). [2]
  3. Some N-version execution environment (NVX) is developed which runs the N-version software and makes final decisions of the N-version programs as a whole given the output of each individual N-version program. [2] The implementation of the decision algorithms can vary ranging from simple as accepting the most frequently occurring output (for instance, if a majority of versions agree on some output, then it is likely to be correct) to some more complex algorithm. [3]

Criticisms

Applications

N-version programming has been applied to software in switching trains, performing flight control computations on modern airliners, electronic voting (the SAVE System), and the detection of zero-day exploits, among other uses. [2] [3] [4]

See also

Related Research Articles

Software testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation. Test techniques include, but are not necessarily limited to:

<span class="mw-page-title-main">Safety engineering</span> Engineering discipline which assures that engineered systems provide acceptable levels of safety

Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail.

In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal methods of mathematics.

In theoretical computer science, an algorithm is correct with respect to a specification if it behaves as specified. Best explored is functional correctness, which refers to the input-output behavior of the algorithm.

In systems engineering, dependability is a measure of a system's availability, reliability, maintainability, and in some cases, other characteristics such as durability, safety and security. In real-time computing, dependability is the ability to provide services that can be trusted within a time-period. The service guarantees must hold even when the system is subject to attacks or natural failures.

In software project management, software testing, and software engineering, verification and validation (V&V) is the process of checking that a software system meets specifications and requirements so that it fulfills its intended purpose. It may also be referred to as software quality control. It is normally the responsibility of software testers as part of the software development lifecycle. In simple terms, software verification is: "Assuming we should build X, does our software achieve its goals without any bugs or gaps?" On the other hand, software validation is: "Was X what we should have built? Does X meet the high-level requirements?"

A Byzantine fault is a condition of a computer system, particularly distributed computing systems, where components may fail and there is imperfect information on whether a component has failed. The term takes its name from an allegory, the "Byzantine generals problem", developed to describe a situation in which, in order to avoid catastrophic failure of the system, the system's actors must agree on a concerted strategy, but some of these actors are unreliable.

Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. They are the de facto standard low-level routines for linear algebra libraries; the routines have bindings for both C and Fortran. Although the BLAS specification is general, BLAS implementations are often optimized for speed on a particular machine, so using them can bring substantial performance benefits. BLAS implementations will take advantage of special floating point hardware such as vector registers or SIMD instructions.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

<span class="mw-page-title-main">Redundancy (engineering)</span> Duplication of critical components to increase reliability of a system

In engineering, redundancy is the intentional duplication of critical components or functions of a system with the goal of increasing reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance, such as in the case of GNSS receivers, or multi-threaded computer processing.

Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system, in which even a small failure can cause total breakdown. Fault tolerance is particularly sought after in high-availability, mission-critical, or even life-critical systems. The ability of maintaining functionality when portions of a system break down is referred to as graceful degradation.

Mutation testing is used to design new software tests and evaluate the quality of existing software tests. Mutation testing involves modifying a program in small ways. Each mutated version is called a mutant and tests detect and reject mutants by causing the behaviour of the original version to differ from the mutant. This is called killing the mutant. Test suites are measured by the percentage of mutants that they kill. New tests can be designed to kill additional mutants. Mutants are based on well-defined mutation operators that either mimic typical programming errors or force the creation of valuable tests. The purpose is to help the tester develop effective tests or locate weaknesses in the test data used for the program or in sections of the code that are seldom or never accessed during execution. Mutation testing is a form of white-box testing.

Reliability, availability and serviceability (RAS), also known as reliability, availability, and maintainability (RAM), is a computer hardware engineering term involving reliability engineering, high availability, and serviceability design. The phrase was originally used by International Business Machines (IBM) as a term to describe the robustness of their mainframe computers.

In computer science, fault injection is a testing technique for understanding how computing systems behave when stressed in unusual ways. This can be achieved using physical- or software-based means, or using a hybrid approach. Widely studied physical fault injections include the application of high voltages, extreme temperatures and electromagnetic pulses on electronic components, such as computer memory and central processing units. By exposing components to conditions beyond their intended operating limits, computing systems can be coerced into mis-executing instructions and corrupting critical data.

<span class="mw-page-title-main">Triple modular redundancy</span>

In computing, triple modular redundancy, sometimes called triple-mode redundancy, (TMR) is a fault-tolerant form of N-modular redundancy, in which three systems perform a process and that result is processed by a majority-voting system to produce a single output. If any one of the three systems fails, the other two systems can correct and mask the fault.

In computing, software engineering, and software testing, a test oracle is a mechanism for determining whether a test has passed or failed. The use of oracles involves comparing the output(s) of the system under test, for a given test-case input, to the output(s) that the oracle determines that product should have. The term "test oracle" was first introduced in a paper by William E. Howden. Additional work on different kinds of oracles was explored by Elaine Weyuker.

The Brooks–Iyengar algorithm or FuseCPA Algorithm or Brooks–Iyengar hybrid algorithm is a distributed algorithm that improves both the precision and accuracy of the interval measurements taken by a distributed sensor network, even in the presence of faulty sensors. The sensor network does this by exchanging the measured value and accuracy value at every node with every other node, and computes the accuracy range and a measured value for the whole network from all of the values collected. Even if some of the data from some of the sensors is faulty, the sensor network will not malfunction. The algorithm is fault-tolerant and distributed. It could also be used as a sensor fusion method. The precision and accuracy bound of this algorithm have been proved in 2016.

Software fault tolerance is the ability of computer software to continue its normal operation despite the presence of system or hardware faults. Fault-tolerant software has the ability to satisfy requirements despite failures.

This is a list of the individual topics in Electronics, Mathematics, and Integrated Circuits that together make up the Computer Engineering field. The organization is by topic to create an effective Study Guide for this field. The contents match the full body of topics and detail information expected of a person identifying themselves as a Computer Engineering expert as laid out by the National Council of Examiners for Engineering and Surveying. It is a comprehensive list and superset of the computer engineering topics generally dealt with at any one time.

References

  1. 1 2 3 4 5 6 7 N-Version Programming: A Fault-Tolerance Approach to Reliability of Software Operation, Liming Chen; Avizienis, A., Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'., Twenty-Fifth International Symposium on, Vol., Iss., 27-30 Jun 1995, Pages:113-
  2. 1 2 3 4 5 6 A.A. Avizienis, “The Methodology of N-version ProgrammingArchived 2005-11-03 at the Wayback Machine , Software Fault Tolerance, edited by M. Lyu, John Wiley & Sons, 1995.
  3. 1 2 3 4 5 Liburd, Soyini. An N-version electronic voting system (Thesis). Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science, 2004.
  4. 1 2 3 Lajos Nagy, Richard Ford, and William Allen. N-Version Programming for the Detection of Zero-day Exploits. The 2006 IEEE Topical Conference on Cybersecurity, Daytona Beach, Florida, April 2006.
  5. Knight, J. C. and Leveson, N. G. 1986. An experimental evaluation of the assumption of independence in multiversion programming. IEEE Trans. Softw. Eng. 12, 1 (Jan. 1986), 96-109.
  6. Knight, J. C. and Leveson, N. G. 1990. A reply to the criticisms of the Knight & Leveson experiment. SIGSOFT Softw. Eng. Notes 15, 1 (Jan. 1990), 24-35.
  7. Sha, L. (July 2001). "Using simplicity to control complexity". IEEE Software. 18 (4): 20–28. doi:10.1109/MS.2001.936213. ISSN   1937-4194.