Fault injection

Last updated

In computer science, fault injection is a testing technique for understanding how computing systems behave when stressed in unusual ways. This can be achieved using physical- or software-based means, or using a hybrid approach. [1] Widely studied physical fault injections include the application of high voltages, extreme temperatures and electromagnetic pulses on electronic components, such as computer memory and central processing units. [2] [3] By exposing components to conditions beyond their intended operating limits, computing systems can be coerced into mis-executing instructions and corrupting critical data.

Contents

In software testing, fault injection is a technique for improving the coverage of a test by introducing faults to test code paths; in particular error handling code paths, that might otherwise rarely be followed. It is often used with stress testing and is widely considered to be an important part of developing robust software. [4] Robustness testing [5] (also known as syntax testing, fuzzing or fuzz testing) is a type of fault injection commonly used to test for vulnerabilities in communication interfaces such as protocols, command line parameters, or APIs.

The propagation of a fault through to an observable failure follows a well-defined cycle. When executed, a fault may cause an error, which is an invalid state within a system boundary. An error may cause further errors within the system boundary, therefore each new error acts as a fault, or it may propagate to the system boundary and be observable. When error states are observed at the system boundary they are termed failures. This mechanism is termed the fault-error-failure cycle [6] and is a key mechanism in dependability.

History

The technique of fault injection dates back to the 1970s [7] when it was first used to induce faults at a hardware level. This type of fault injection is called Hardware Implemented Fault Injection (HWIFI) and attempts to simulate hardware failures within a system. The first experiments in hardware fault involved nothing more than shorting connections on circuit boards and observing the effect on the system (bridging faults). It was used primarily as a test of the dependability of the hardware system. Later specialised hardware was developed to extend this technique, such as devices to bombard specific areas of a circuit board with heavy radiation. It was soon found that faults could be induced by software techniques and that aspects of this technique could be useful for assessing software systems. Collectively these techniques are known as Software Implemented Fault Injection (SWIFI).

Software implemented fault injection

SWIFI techniques can be categorized into two types: compile-time injection and runtime injection.

Compile-time injection is an injection technique where source code is modified to inject simulated faults into a system. One method is called mutation testing which changes existing lines of code so that they contain faults. A simple example of this technique could be changing a = a + 1 to a = a – 1

Code mutation produces faults which are very similar to those unintentionally added by programmers.

A refinement of code mutation is Code Insertion Fault Injection which adds code, rather than modifying existing code. This is usually done through the use of perturbation functions which are simple functions which take an existing value and perturb it via some logic into another value, for example

intpFunc(intvalue){returnvalue+20;}intmain(intargc,char*argv[]){inta=pFunc(aFunction(atoi(argv[1])));if(a>20){/* do something */}else{/* do something else */}}

In this case, pFunc is the perturbation function and it is applied to the return value of the function that has been called introducing a fault into the system.

Runtime Injection techniques use a software trigger to inject a fault into a running software system. Faults can be injected via a number of physical methods and triggers can be implemented in a number of ways, such as: Time Based triggers (When the timer reaches a specified time an interrupt is generated and the interrupt handler associated with the timer can inject the fault. ); Interrupt Based Triggers (Hardware exceptions and software trap mechanisms are used to generate an interrupt at a specific place in the system code or on a particular event within the system, for instance, access to a specific memory location).

Runtime injection techniques can use a number of different techniques to insert faults into a system via a trigger.

These techniques are often based around the debugging facilities provided by computer processor architectures.

Protocol software fault injection

Complex software systems, especially multi-vendor distributed systems based on open standards, perform input/output operations to exchange data via stateful, structured exchanges known as "protocols." One kind of fault injection that is particularly useful to test protocol implementations (a type of software code that has the unusual characteristic in that it cannot predict or control its input) is fuzzing. Fuzzing is an especially useful form of Black-box testing since the various invalid inputs that are submitted to the software system do not depend on, and are not created based on knowledge of, the details of the code running inside the system.

Hardware implemented fault injection

This technique was applied on a hardware prototype. Testers inject fault by changing voltage of some parts in a circuit, increasing or decreasing temperature, bombarding the board by high energy radiation, etc.

Characteristics of fault injection

Faults have three main parameters. [8]

These parameters create the fault space realm. The fault space realm will increase exponentially by increasing system complexity. Therefore, the traditional fault injection method will not be applicable to use in the modern cyber-physical systems, because they will be so slow, and they will find a small number of faults (less fault coverage). Hence, the testers need an efficient algorithm to choose critical faults that have a higher impact on system behavior. Thus, the main research question is how to find critical faults in the fault space realm which have catastrophic effects on system behavior. Here are some methods that can aid fault injection to efficiently explore the fault space to reach higher fault coverage in less simulation time.

Fault injection tools

Although these types of faults can be injected by hand the possibility of introducing an unintended fault is high, so tools exist to parse a program automatically and insert faults.

Research tools

A number of SWIFI Tools have been developed and a selection of these tools is given here. Six commonly used fault injection tools are Ferrari, FTAPE, Doctor, Orchestra, Xception and Grid-FIT.

Commercial tools

Libraries

Fault injection in functional properties or test cases

In contrast to traditional mutation testing where mutant faults are generated and injected into the code description of the model, application of a series of newly defined mutation operators directly to the model properties rather than to the model code has also been investigated. [29] Mutant properties that are generated from the initial properties (or test cases) and validated by the model checker should be considered as new properties that have been missed during the initial verification procedure. Therefore, adding these newly identified properties to the existing list of properties improves the coverage metric of the formal verification and consequently lead to a more reliable design.

Application of fault injection

Fault injection can take many forms. In the testing of operating systems for example, fault injection is often performed by a driver (kernel-mode software) that intercepts system calls (calls into the kernel) and randomly returning a failure for some of the calls. This type of fault injection is useful for testing low-level user-mode software. For higher level software, various methods inject faults. In managed code, it is common to use instrumentation. Although fault injection can be undertaken by hand, a number of fault injection tools exist to automate the process of fault injection. [30]

Depending on the complexity of the API for the level where faults are injected, fault injection tests often must be carefully designed to minimize the number of false positives. Even a well designed fault injection test can sometimes produce situations that are impossible in the normal operation of the software. For example, imagine there are two API functions, Commit and PrepareForCommit, such that alone, each of these functions can possibly fail, but if PrepareForCommit is called and succeeds, a subsequent call to Commit is guaranteed to succeed. Now consider the following code:

error=PrepareForCommit();if(error==SUCCESS){error=Commit();assert(error==SUCCESS);}

Often, it will be infeasible for the fault injection implementation to keep track of enough state to make the guarantee that the API functions make. In this example, a fault injection test of the above code might hit the assert, whereas this would never happen in normal operation.

See also

Related Research Articles

<span class="mw-page-title-main">Software testing</span> Checking software against a standard

Software testing is the act of checking whether software satisfies expectations.

<span class="mw-page-title-main">Embedded system</span> Computer system with a dedicated function

An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is embedded as part of a complete device often including electrical or electronic hardware and mechanical parts. Because an embedded system typically controls physical operations of the machine that it is embedded within, it often has real-time computing constraints. Embedded systems control many devices in common use. In 2009, it was estimated that ninety-eight percent of all microprocessors manufactured were used in embedded systems.

In systems engineering, dependability is a measure of a system's availability, reliability, maintainability, and in some cases, other characteristics such as durability, safety and security. In real-time computing, dependability is the ability to provide services that can be trusted within a time-period. The service guarantees must hold even when the system is subject to attacks or natural failures.

System testing, a.k.a. end-to-end (E2E) testing, is testing conducted on a complete software system.

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

Fault tolerance is the ability of a system to maintain proper operation despite failures or faults in one or more of its components. This capability is essential for high-availability, mission-critical, or even life-critical systems.

Mutation testing is used to design new software tests and evaluate the quality of existing software tests. Mutation testing involves modifying a program in small ways. Each mutated version is called a mutant and tests detect and reject mutants by causing the behaviour of the original version to differ from the mutant. This is called killing the mutant. Test suites are measured by the percentage of mutants that they kill. New tests can be designed to kill additional mutants. Mutants are based on well-defined mutation operators that either mimic typical programming errors or force the creation of valuable tests. The purpose is to help the tester develop effective tests or locate weaknesses in the test data used for the program or in sections of the code that are seldom or never accessed during execution. Mutation testing is a form of white-box testing.

Application security includes all tasks that introduce a secure software development life cycle to development teams. Its final goal is to improve security practices and, through that, to find, fix and preferably prevent security issues within applications. It encompasses the whole application life cycle from requirements analysis, design, implementation, verification as well as maintenance.

Stress testing is a software testing activity that determines the robustness of software by testing beyond the limits of normal operation. Stress testing is particularly important for "mission critical" software, but is used for all types of software. Stress tests commonly put a greater emphasis on robustness, availability, and error handling under a heavy load, than on what would be considered correct behavior under normal circumstances.

Fault Tolerant Messaging in the context of computer systems and networks, refers to a design approach and set of techniques aimed at ensuring reliable and continuous communication between components or nodes even in the presence of errors or failures. This concept is especially critical in distributed systems, where components may be geographically dispersed and interconnected through networks, making them susceptible to various potential points of failure.

Model-based design (MBD) is a mathematical and visual method of addressing problems associated with designing complex control, signal processing and communication systems. It is used in many motion control, industrial equipment, aerospace, and automotive applications. Model-based design is a methodology applied in designing embedded software.

<span class="mw-page-title-main">Emulator</span> System allowing a device to imitate another

In computing, an emulator is hardware or software that enables one computer system to behave like another computer system. An emulator typically enables the host system to run software or use peripheral devices designed for the guest system. Emulation refers to the ability of a computer program in an electronic device to emulate another program or device.

API testing is a type of software testing that involves testing application programming interfaces (APIs) directly and as part of integration testing to determine if they meet expectations for functionality, reliability, performance, and security. Since APIs lack a GUI, API testing is performed at the message layer. API testing is now considered critical for automating testing because APIs serve as the primary interface to application logic and because GUI tests are difficult to maintain with the short release cycles and frequent changes commonly used with Agile software development and DevOps.

In computer science, robustness is the ability of a computer system to cope with errors during execution and cope with erroneous input. Robustness can encompass many areas of computer science, such as robust programming, robust machine learning, and Robust Security Network. Formal techniques, such as fuzz testing, are essential to showing robustness since this type of testing involves invalid or unexpected inputs. Alternatively, fault injection can be used to test robustness. Various commercial products perform robustness testing of software analysis.

Robustness testing is any quality assurance methodology focused on testing the robustness of software. Robustness testing has also been used to describe the process of verifying the robustness of test cases in a test process. ANSI and IEEE have defined robustness as the degree to which a system or component can function correctly in the presence of invalid inputs or stressful environmental conditions.

<span class="mw-page-title-main">Device driver synthesis and verification</span>

Device drivers are programs which allow software or higher-level computer programs to interact with a hardware device. These software components act as a link between the devices and the operating systems, communicating with each of these systems and executing commands. They provide an abstraction layer for the software above and also mediate the communication between the operating system kernel and the devices below.

Real-time testing is the process of testing real-time computer systems.

<span class="mw-page-title-main">Gatling (software)</span>

Gatling is a load- and performance-testing framework based on Scala, Akka and Netty. The first stable release was published on January 13, 2012. In 2015, Gatling's founder, Stéphane Landelle, created a company, dedicated to the development of the open-source project. According to Gatling Corp's official website, Gatling was downloaded more than 20,000,000 times (2024). In June 2016, Gatling officially presented Gatling FrontLine, Gatling's Enterprise Version with additional features.

Unified Diagnostic Services (UDS) is a diagnostic communication protocol used in electronic control units (ECUs) within automotive electronics, which is specified in the ISO 14229-1. It is derived from ISO 14230-3 (KWP2000) and the now obsolete ISO 15765-3. 'Unified' in this context means that it is an international and not a company-specific standard. By now this communication protocol is used in all new ECUs made by Tier 1 suppliers of Original Equipment Manufacturer (OEM), and is incorporated into other standards, such as AUTOSAR. The ECUs in modern vehicles control nearly all functions, including electronic fuel injection (EFI), engine control, the transmission, anti-lock braking system, door locks, braking, window operation, and more.

This article discusses a set of tactics useful in software testing. It is intended as a comprehensive list of tactical approaches to software quality assurance and general application of the test method.

References

  1. Moradi, Mehrdad; Van Acker, Bert; Vanherpen, Ken; Denil, Joachim (2019). "Model-Implemented Hybrid Fault Injection for Simulink (Tool Demonstrations)". In Chamberlain, Roger; Taha, Walid; Törngren, Martin (eds.). Cyber Physical Systems. Model-Based Design. Lecture Notes in Computer Science. Vol. 11615. Springer International Publishing. pp. 71–90. doi:10.1007/978-3-030-23703-5_4. ISBN   9783030237035. S2CID   195769468.
  2. Shepherd, Carlton; Markantonakis, Konstantinos; Van Heijningen, Nico; Aboulkassimi, Driss; Gaine, Clement; Heckmann, Thibaut; Naccache, David (2021). "Physical fault injection and side-channel attacks on mobile devices: A comprehensive analysis". Computers & Security. 111 (102471). Elsevier: 102471. arXiv: 2105.04454 . doi:10.1016/j.cose.2021.102471. S2CID   236957400.
  3. Bar-El, Hagai; Choukri, Hamid; Naccache, David; Tunstall, Michael; Whelan, Claire (2004). "The sorcerer's apprentice guide to fault attacks". Proceedings of the IEEE. 94 (2). IEEE: 370–382. doi:10.1109/JPROC.2005.862424. S2CID   2397174.
  4. J. Voas, "Fault Injection for the Masses," Computer, vol. 30, pp. 129–130, 1997.
  5. 1 2 Kaksonen, Rauli. A Functional Method for Assessing Protocol Implementation Security. 2001.
  6. A. Avizienis, J.-C. Laprie, Brian Randell, and C. Landwehr, "Basic Concepts and Taxonomy of Dependable and Secure Computing," Dependable and Secure Computing, vol. 1, pp. 11–33, 2004.
  7. 1 2 J. V. Carreira, D. Costa, and S. J. G, "Fault Injection Spot-Checks Computer System Dependability," IEEE Spectrum, pp. 50–55, 1999.
  8. Benso, Alfredo; Prinetto, Paolo, eds. (2003). Fault Injection Techniques and Tools for Embedded Systems Reliability Evaluation. Frontiers in Electronic Testing. Springer US. ISBN   978-1-4020-7589-6.
  9. "Optimizing fault injection in FMI co-simulation through sensitivity partitioning | Proceedings of the 2019 Summer Simulation Conference". dl.acm.org. Retrieved 2020-06-14.
  10. Moradi, M., Oakes, B.J., Saraoglu, M., Morozov, A., Janschek, K. and Denil, J., 2020. EXPLORING FAULT PARAMETER SPACE USING REINFORCEMENT LEARNING-BASED FAULT INJECTION.
  11. Rickard Svenningsson, Jonny Vinter, Henrik Eriksson and Martin Torngren, "MODIFI: A MODel-Implemented Fault Injection Tool,", Lecture Notes in Computer Science, 2010, Volume 6351/2010, 210-222.
  12. G. A. Kanawati, N. A. Kanawati, and J. A. Abraham, "FERRARI: A Flexible Software-Based Fault and Error Injection System," IEEE Transactions on Computers, vol. 44, pp. 248, 1995.
  13. T. Tsai and R. Iyer, "FTAPE: A Fault Injection Tool to Measure Fault Tolerance," presented at Computing in aerospace, San Antonio; TX, 1995.
  14. S. Han, K. G. Shin, and H. A. Rosenberg, "DOCTOR: An IntegrateD SOftware Fault InjeCTiOn EnviRonment for Distributed Real-time Systems," presented at International Computer Performance and Dependability Symposium, Erlangen; Germany, 1995.
  15. S. Dawson, F. Jahanian, and T. Mitton, "ORCHESTRA: A Probing and Fault Injection Environment for Testing Protocol Implementations," presented at International Computer Performance and Dependability Symposium, Urbana-Champaign, USA, 1996.
  16. Grid-FIT Web-site Archived 2 February 2008 at the Wayback Machine
  17. N. Looker, B. Gwynne, J. Xu, and M. Munro, "An Ontology-Based Approach for Determining the Dependability of Service-Oriented Architectures," in the proceedings of the 10th IEEE International Workshop on Object-oriented Real-time Dependable Systems, USA, 2005.
  18. N. Looker, M. Munro, and J. Xu, "A Comparison of Network Level Fault Injection with Code Insertion," in the proceedings of the 29th IEEE International Computer Software and Applications Conference, Scotland, 2005.
  19. LFI Website
  20. Flatag (2020-05-16), Flatag/FIBlock , retrieved 2020-05-16
  21. beSTORM product information
  22. ExhaustiF SWIFI Tool Site
  23. Holodeck product overview Archived 13 October 2008 at the Wayback Machine
  24. Codenomicon Defensics product overview
  25. Mu Service Analyzer
  26. Mu Dynamics, Inc.
  27. Xception Web Site
  28. Critical Software SA
  29. Abbasinasab, Ali; Mohammadi, Mehdi; Mohammadi, Siamak; Yanushkevich, Svetlana; Smith, Michael (2011). "Mutant Fault Injection in Functional Properties of a Model to Improve Coverage Metrics". 2011 14th Euromicro Conference on Digital System Design. pp. 422–425. doi:10.1109/DSD.2011.57. ISBN   978-1-4577-1048-3. S2CID   15992130.
  30. N. Looker, M. Munro, and J. Xu, "Simulating Errors in Web Services," International Journal of Simulation Systems, Science & Technology, vol. 5, 2004.