Immunity-aware programming

Last updated

Immunity-aware programming is a programming technique used when writing firmware for an embedded system to improve tolerance of transient errors that would otherwise lead to failure. Transient errors are typically caused by single event upsets, insufficient power, or by strong electromagnetic signals transmitted by some other source device.

Immunity-aware programming is an example of defensive programming and EMC-aware programming. Although most of these techniques apply to the software in the "victim" device to make it more reliable, a few of these techniques apply to software in the "source" device to make it emit less unwanted noise.

See also

Related Research Articles

<span class="mw-page-title-main">Electromagnetic compatibility</span>

Electromagnetic compatibility (EMC) is the ability of electrical equipment and systems to function acceptably in their electromagnetic environment, by limiting the unintentional generation, propagation and reception of electromagnetic energy which may cause unwanted effects such as electromagnetic interference (EMI) or even physical damage in operational equipment. The goal of EMC is the correct operation of different equipment in a common electromagnetic environment. It is also the name given to the associated branch of electrical engineering.

<span class="mw-page-title-main">Embedded system</span> Computer system with a dedicated function

An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is embedded as part of a complete device often including electrical or electronic hardware and mechanical parts. Because an embedded system typically controls physical operations of the machine that it is embedded within, it often has real-time computing constraints. Embedded systems control many devices in common use today. In 2009, it was estimated that ninety-eight percent of all microprocessors manufactured were used in embedded systems.

Checkpointing is a technique that provides fault tolerance for computing systems. It basically consists of saving a snapshot of the application's state, so that applications can restart from that point in case of failure. This is particularly important for long running applications that are executed in failure-prone computing systems.

BiiN was a company created out of a joint research project by Intel and Siemens to develop fault tolerant high-performance multi-processor computers build on custom microprocessor designs. BiiN was an outgrowth of the Intel iAPX 432 multiprocessor project, ancestor of iPSC and nCUBE.

In systems engineering, dependability is a measure of a system's availability, reliability, maintainability, and in some cases, other characteristics such as durability, safety and security. In real-time computing, dependability is the ability to provide services that can be trusted within a time-period. The service guarantees must hold even when the system is subject to attacks or natural failures.

Radiation hardening is the process of making electronic components and circuits resistant to damage or malfunction caused by high levels of ionizing radiation, especially for environments in outer space, around nuclear reactors and particle accelerators, or during nuclear accidents or nuclear warfare.

<span class="mw-page-title-main">Electromagnetic interference</span> Disturbance in an electrical circuit due to external sources of radio waves

Electromagnetic interference (EMI), also called radio-frequency interference (RFI) when in the radio frequency spectrum, is a disturbance generated by an external source that affects an electrical circuit by electromagnetic induction, electrostatic coupling, or conduction. The disturbance may degrade the performance of the circuit or even stop it from functioning. In the case of a data path, these effects can range from an increase in error rate to a total loss of the data. Both man-made and natural sources generate changing electrical currents and voltages that can cause EMI: ignition systems, cellular network of mobile phones, lightning, solar flares, and auroras. EMI frequently affects AM radios. It can also affect mobile phones, FM radios, and televisions, as well as observations for radio astronomy and atmospheric science.

LEON is a radiation-tolerant 32-bit central processing unit (CPU) microprocessor core that implements the SPARC V8 instruction set architecture (ISA) developed by Sun Microsystems. It was originally designed by the European Space Research and Technology Centre (ESTEC), part of the European Space Agency (ESA), without any involvement by Sun. Later versions have been designed by Gaisler Research, under a variety of owners. It is described in synthesizable VHSIC Hardware Description Language (VHDL). LEON has a dual license model: An GNU Lesser General Public License (LGPL) and GNU General Public License (GPL) free and open-source software (FOSS) license that can be used without licensing fee, or a proprietary license that can be purchased for integration in a proprietary product. The core is configurable through VHDL generics, and is used in system on a chip (SOC) designs both in research and commercial settings.

Stratus VOS is a proprietary operating system running on Stratus Technologies fault-tolerant computer systems. VOS is available on Stratus's ftServer and Continuum platforms. VOS customers use it to support high-volume transaction processing applications which require continuous availability. VOS is notable for being one of the few operating systems which run on fully lockstepped hardware.

<span class="mw-page-title-main">Redundancy (engineering)</span> Duplication of critical components to increase reliability of a system

In engineering, redundancy is the intentional duplication of critical components or functions of a system with the goal of increasing reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance, such as in the case of GNSS receivers, or multi-threaded computer processing.

Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system, in which even a small failure can cause total breakdown. Fault tolerance is particularly sought after in high-availability, mission-critical, or even life-critical systems. The ability of maintaining functionality when portions of a system break down is referred to as graceful degradation.

Reliability, availability and serviceability (RAS), also known as reliability, availability, and maintainability (RAM), is a computer hardware engineering term involving reliability engineering, high availability, and serviceability design. The phrase was originally used by International Business Machines (IBM) as a term to describe the robustness of their mainframe computers.

Electromagnetic compatibility (EMC)-aware programming involves writing software which is resilient to errors induced by electromagnetic fields.

Emission-aware programming is a design philosophy aiming to reduce the amount of electromagnetic radiation emitted by electronic devices through proper design of the software executed by the device, rather than changing the hardware.

In computer science, fault injection is a testing technique for understanding how computing systems behave when stressed in unusual ways. This can be achieved using physical- or software-based means, or using a hybrid approach. Widely studied physical fault injections include the application of high voltages, extreme temperatures and electromagnetic pulses on electronic components, such as computer memory and central processing units. By exposing components to conditions beyond their intended operating limits, computing systems can be coerced into mis-executing instructions and corrupting critical data.

<span class="mw-page-title-main">Device driver synthesis and verification</span>

Device drivers are programs which allow software or higher-level computer programs to interact with a hardware device. These software components act as a link between the devices and the operating systems, communicating with each of these systems and executing commands. They provide an abstraction layer for the software above and also mediate the communication between the operating system kernel and the devices below.

Software fault tolerance is the ability of computer software to continue its normal operation despite the presence of system or hardware faults. Fault-tolerant software has the ability to satisfy requirements despite failures.

Mathai Joseph is an Indian computer scientist and author.