Safety-critical system

Last updated
C-141C Glass Cockpit Upgrade.JPEG
OPT IVs Leads Ruler IMG 2164.jpg
STS120LaunchHiRes-edit1.jpg
RIAN archive 342604 The Novovoronezh nuclear power plant.jpg
Examples [1] of safety-critical systems. From left to right, top to bottom: the glass cockpit of a C-141, a pacemaker, the Space Shuttle and the control room of a nuclear power plant.

A safety-critical system [2] or life-critical system is a system whose failure or malfunction may result in one (or more) of the following outcomes: [3] [4]

Contents

A safety-related system (or sometimes safety-involved system) comprises everything (hardware, software, and human aspects) needed to perform one or more safety functions, in which failure would cause a significant increase in the safety risk for the people or environment involved. [5] Safety-related systems are those that do not have full responsibility for controlling hazards such as loss of life, severe injury or severe environmental damage. The malfunction of a safety-involved system would only be that hazardous in conjunction with the failure of other systems or human error. Some safety organizations provide guidance on safety-related systems, for example the Health and Safety Executive in the United Kingdom. [6]

Risks of this sort are usually managed with the methods and tools of safety engineering. A safety-critical system is designed to lose less than one life per billion (109) hours of operation. [7] [8] Typical design methods include probabilistic risk assessment, a method that combines failure mode and effects analysis (FMEA) with fault tree analysis. Safety-critical systems are increasingly computer-based.

Safety-critical systems are a concept often used together with the Swiss cheese model to represent (usually in a bow-tie diagram) how a threat can escalate to a major accident through the failure of multiple critical barriers. This use has become common especially in the domain of process safety, in particular when applied to oil and gas drilling and production both for illustrative purposes and to support other processes, such as asset integrity management and incident investigation. [9]

Reliability regimes

Several reliability regimes for safety-critical systems exist:

Software engineering for safety-critical systems

Software engineering for safety-critical systems is particularly difficult. There are three aspects which can be applied to aid the engineering software for life-critical systems. First is process engineering and management. Secondly, selecting the appropriate tools and environment for the system. This allows the system developer to effectively test the system by emulation and observe its effectiveness. Thirdly, address any legal and regulatory requirements, such as Federal Aviation Administration requirements for aviation. By setting a standard for which a system is required to be developed under, it forces the designers to stick to the requirements. The avionics industry has succeeded in producing standard methods for producing life-critical avionics software. Similar standards exist for industry, in general, (IEC 61508) and automotive (ISO 26262), medical (IEC 62304) and nuclear (IEC 61513) industries specifically. The standard approach is to carefully code, inspect, document, test, verify and analyze the system. Another approach is to certify a production system, a compiler, and then generate the system's code from specifications. Another approach uses formal methods to generate proofs that the code meets requirements. [12] All of these approaches improve the software quality in safety-critical systems by testing or eliminating manual steps in the development process, because people make mistakes, and these mistakes are the most common cause of potential life-threatening errors.

Examples of safety-critical systems

Infrastructure

Medicine [13]

The technology requirements can go beyond avoidance of failure, and can even facilitate medical intensive care (which deals with healing patients), and also life support (which is for stabilizing patients).

Nuclear engineering [15]

Oil and gas production [16]

Recreation

Transport

Railway [17]

Automotive [19]

Aviation [20]

Spaceflight [21]

See also

Related Research Articles

<span class="mw-page-title-main">Safety engineering</span> Engineering discipline which assures that engineered systems provide acceptable levels of safety

Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail.

In engineering, a fail-safe is a design feature or practice that, in the event of a specific type of failure, inherently responds in a way that will cause minimal or no harm to other equipment, to the environment or to people. Unlike inherent safety to a particular hazard, a system being "fail-safe" does not mean that failure is impossible or improbable, but rather that the system's design prevents or mitigates unsafe consequences of the system's failure. That is, if and when a "fail-safe" system fails, it remains at least as safe as it was before the failure. Since many types of failure are possible, failure mode and effects analysis is used to examine failure situations and recommend safety design and procedures.

<span class="mw-page-title-main">Fault tree analysis</span> Failure analysis system used in safety engineering and reliability engineering

Fault tree analysis (FTA) is a type of failure analysis in which an undesired state of a system is examined. This analysis method is mainly used in safety engineering and reliability engineering to understand how systems can fail, to identify the best ways to reduce risk and to determine event rates of a safety accident or a particular system level (functional) failure. FTA is used in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other high-hazard industries; but is also used in fields as diverse as risk factor identification relating to social service system failure. FTA is also used in software engineering for debugging purposes and is closely related to cause-elimination technique used to detect bugs.

In systems engineering, dependability is a measure of a system's availability, reliability, maintainability, and in some cases, other characteristics such as durability, safety and security. In real-time computing, dependability is the ability to provide services that can be trusted within a time-period. The service guarantees must hold even when the system is subject to attacks or natural failures.

A mission critical factor of a system is any factor that is essential to business operation or to an organization. Failure or disruption of mission critical factors will result in serious impact on business operations or upon an organization, and even can cause social turmoil and catastrophes.

A built-in self-test (BIST) or built-in test (BIT) is a mechanism that permits a machine to test itself. Engineers design BISTs to meet requirements such as:

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system, in which even a small failure can cause total breakdown. Fault tolerance is particularly sought after in high-availability, mission-critical, or even life-critical systems. The ability of maintaining functionality when portions of a system break down is referred to as graceful degradation.

DO-178B, Software Considerations in Airborne Systems and Equipment Certification is a guideline dealing with the safety of safety-critical software used in certain airborne systems. It was jointly developed by the safety-critical working group RTCA SC-167 of the Radio Technical Commission for Aeronautics (RTCA) and WG-12 of the European Organisation for Civil Aviation Equipment (EUROCAE). RTCA published the document as RTCA/DO-178B, while EUROCAE published the document as ED-12B. Although technically a guideline, it was a de facto standard for developing avionics software systems until it was replaced in 2012 by DO-178C.

In functional safety, safety integrity level (SIL) is defined as the relative level of risk-reduction provided by a safety instrumented function (SIF), i.e. the measurement of the performance required of the SIF.

A hazard analysis is used as the first step in a process used to assess risk. The result of a hazard analysis is the identification of different types of hazards. A hazard is a potential condition and exists or not. It may, in single existence or in combination with other hazards and conditions, become an actual Functional Failure or Accident (Mishap). The way this exactly happens in one particular sequence is called a scenario. This scenario has a probability of occurrence. Often a system has many potential failure scenarios. It also is assigned a classification, based on the worst case severity of the end condition. Risk is the combination of probability and severity. Preliminary risk levels can be provided in the hazard analysis. The validation, more precise prediction (verification) and acceptance of risk is determined in the risk assessment (analysis). The main goal of both is to provide the best selection of means of controlling or eliminating the risk. The term is used in several engineering specialties, including avionics, food safety, occupational safety and health, process safety, reliability engineering.

IEC 61508 is an international standard published by the International Electrotechnical Commission (IEC) consisting of methods on how to apply, design, deploy and maintain automatic protection systems called safety-related systems. It is titled Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems.

IEC standard 61511 is a technical standard which sets out practices in the engineering of systems that ensure the safety of an industrial process through the use of instrumentation. Such systems are referred to as Safety Instrumented Systems. The title of the standard is "Functional safety - Safety instrumented systems for the process industry sector".

Medical equipment management is a term for the professionals who manage operations, analyze and improve utilization and safety, and support servicing healthcare technology. These healthcare technology managers are, much like other healthcare professionals referred to by various specialty or organizational hierarchy names.

Functional safety is the part of the overall safety of a system or piece of equipment that depends on automatic protection operating correctly in response to its inputs or failure in a predictable manner (fail-safe). The automatic protection system should be designed to properly handle likely human errors, systematic errors, hardware failures and operational/environmental stress.

ISO 26262, titled "Road vehicles – Functional safety", is an international standard for functional safety of electrical and/or electronic systems that are installed in serial production road vehicles, defined by the International Organization for Standardization (ISO) in 2011, and revised in 2018.

Partial stroke testing is a technique used in a control system to allow the user to test a percentage of the possible failure modes of a shut down valve without the need to physically close the valve. PST is used to assist in determining that the safety function will operate on demand. PST is most often used on high integrity emergency shutdown valves (ESDVs) in applications where closing the valve will have a high cost burden yet proving the integrity of the valve is essential to maintaining a safe facility. In addition to ESDVs PST is also used on high integrity pressure protection systems or HIPPS. Partial stroke testing is not a replacement for the need to fully stroke valves as proof testing is still a mandatory requirement.

<span class="mw-page-title-main">AC 25.1309-1</span> American aviation regulatory document

AC 25.1309–1 is an FAA Advisory Circular (AC) that identifies acceptable means for showing compliance with the airworthiness requirements of § 25.1309 of the Federal Aviation Regulations. Revision A was releases in 1988. In 2002, work was done on Revision B, but it was not formally released; the result is the Rulemaking Advisory Committee-recommended revision B-Arsenal Draft (2002). The Arsenal Draft is "considered to exist as a relatively mature draft". The FAA and EASA have subsequently accepted proposals by type certificate applicants to use the Arsenal Draft on development programs.

High-integrity software is software whose failure may cause serious damage with possible "life-threatening consequences." “Integrity is important as it demonstrates the safety, security, and maintainability of… code.” Examples of high-integrity software are nuclear reactor control, avionics software, and process control software.

[H]igh integrity means that the code:

References

  1. J.C. Knight (2002). "Safety critical systems: challenges and directions". IEEE. pp. 547–550.
  2. "Safety-critical system". encyclopedia.com . Retrieved 15 April 2017.
  3. Sommerville, Ian (2015). Software Engineering (PDF). Pearson India. ISBN   978-9332582699. Archived from the original (PDF) on 2018-04-17. Retrieved 2018-04-18.
  4. Sommerville, Ian (2014-07-24). "Critical systems". an Sommerville's book website. Archived from the original on 2019-09-16. Retrieved 18 April 2018.
  5. "FAQ – Edition 2.0: E) Key concepts". IEC 61508 – Functional Safety. International Electrotechnical Commission. Archived from the original on 25 October 2020. Retrieved 23 October 2016.
  6. "Part 1: Key guidance" (PDF). Managing competence for safety-related systems. UK: Health and Safety Executive. 2007. Retrieved 23 October 2016.
  7. FAA AC 25.1309-1A – System Design and Analysis
  8. Bowen, Jonathan P. (April 2000). "The Ethics of Safety-Critical Systems". Communications of the ACM . 43 (4): 91–97. doi:10.1145/332051.332078. S2CID   15979368.
  9. CCPS in association with Energy Institute (2018). Bow Ties in Risk Management: A Concept Book for Process Safety. New York, N.Y. and Hoboken, N.J.: AIChE and John Wiley & Sons. ISBN   9781119490395.
  10. Thompson, Nicholas (2009-09-21). "Inside the Apocalyptic Soviet Doomsday Machine". WIRED.
  11. "Definition fail-soft".
  12. Bowen, Jonathan P.; Stavridou, Victoria (July 1993). "Safety-critical systems, formal methods and standards". Software Engineering Journal . IEE/BCS. 8 (4): 189–209. doi:10.1049/sej.1993.0025. S2CID   9756364.
  13. "Medical Device Safety System Design: A Systematic Approach". mddionline.com. 2012-01-24.
  14. Anderson, RJ; Smith, MF, eds. (September–December 1998). "Special Issue: Confidentiality, Privacy and Safety of Healthcare Systems". Health Informatics Journal. 4 (3–4).
  15. "Safety of Nuclear Reactors". world-nuclear.org. Archived from the original on 2016-01-18. Retrieved 2013-12-18.
  16. Step Change in Safety (2018). Assurance and Verification Practitioners' Guidance Document. Aberdeen: Step Change in Safety.
  17. "Safety-Critical Systems in Rail Transportation" (PDF). Rtos.com. Archived from the original (PDF) on 2013-12-19. Retrieved 2016-10-23.
  18. 1 2 Wayback Machine
  19. "Safety-Critical Automotive Systems". sae.org.
  20. Leanna Rierson (2013-01-07). Developing Safety-Critical Software: A Practical Guide for Aviation Software and DO-178C Compliance. CRC Press. ISBN   978-1-4398-1368-3.
  21. "Human-Rating Requirements and Guidelinesfor Space Flight Systems" (PDF). NASA Procedures and Guidelines. June 19, 2003. NPG: 8705.2. Archived from the original (PDF) on 2021-03-17. Retrieved 2016-10-23.