This article needs additional citations for verification .(April 2009) |
A safety-critical system [2] or life-critical system is a system whose failure or malfunction may result in one (or more) of the following outcomes: [3] [4]
A safety-related system (or sometimes safety-involved system) comprises everything (hardware, software, and human aspects) needed to perform one or more safety functions, in which failure would cause a significant increase in the safety risk for the people or environment involved. [5] Safety-related systems are those that do not have full responsibility for controlling hazards such as loss of life, severe injury or severe environmental damage. The malfunction of a safety-involved system would only be that hazardous in conjunction with the failure of other systems or human error. Some safety organizations provide guidance on safety-related systems, for example the Health and Safety Executive in the United Kingdom. [6]
Risks of this sort are usually managed with the methods and tools of safety engineering. A safety-critical system is designed to lose less than one life per billion (109) hours of operation. [7] [8] Typical design methods include probabilistic risk assessment, a method that combines failure mode and effects analysis (FMEA) with fault tree analysis. Safety-critical systems are increasingly computer-based.
Safety-critical systems are a concept often used together with the Swiss cheese model to represent (usually in a bow-tie diagram) how a threat can escalate to a major accident through the failure of multiple critical barriers. This use has become common especially in the domain of process safety, in particular when applied to oil and gas drilling and production both for illustrative purposes and to support other processes, such as asset integrity management and incident investigation. [9]
Several reliability regimes for safety-critical systems exist:
Software engineering for safety-critical systems is particularly difficult. There are three aspects which can be applied to aid the engineering software for life-critical systems. First is process engineering and management. Secondly, selecting the appropriate tools and environment for the system. This allows the system developer to effectively test the system by emulation and observe its effectiveness. Thirdly, address any legal and regulatory requirements, such as Federal Aviation Administration requirements for aviation. By setting a standard for which a system is required to be developed under, it forces the designers to stick to the requirements. The avionics industry has succeeded in producing standard methods for producing life-critical avionics software. Similar standards exist for industry, in general, (IEC 61508) and automotive (ISO 26262), medical (IEC 62304) and nuclear (IEC 61513) industries specifically. The standard approach is to carefully code, inspect, document, test, verify and analyze the system. Another approach is to certify a production system, a compiler, and then generate the system's code from specifications. Another approach uses formal methods to generate proofs that the code meets requirements. [12] All of these approaches improve the software quality in safety-critical systems by testing or eliminating manual steps in the development process, because people make mistakes, and these mistakes are the most common cause of potential life-threatening errors.
The technology requirements can go beyond avoidance of failure, and can even facilitate medical intensive care (which deals with healing patients), and also life support (which is for stabilizing patients).
Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail.
In engineering, a fail-safe is a design feature or practice that, in the event of a failure of the design feature, inherently responds in a way that will cause minimal or no harm to other equipment, to the environment or to people. Unlike inherent safety to a particular hazard, a system being "fail-safe" does not mean that failure is naturally inconsequential, but rather that the system's design prevents or mitigates unsafe consequences of the system's failure. If and when a "fail-safe" system fails, it remains at least as safe as it was before the failure. Since many types of failure are possible, failure mode and effects analysis is used to examine failure situations and recommend safety design and procedures.
Fault tree analysis (FTA) is a type of failure analysis in which an undesired state of a system is examined. This analysis method is mainly used in safety engineering and reliability engineering to understand how systems can fail, to identify the best ways to reduce risk and to determine event rates of a safety accident or a particular system level (functional) failure. FTA is used in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other high-hazard industries; but is also used in fields as diverse as risk factor identification relating to social service system failure. FTA is also used in software engineering for debugging purposes and is closely related to cause-elimination technique used to detect bugs.
In systems engineering, dependability is a measure of a system's availability, reliability, maintainability, and in some cases, other characteristics such as durability, safety and security. In real-time computing, dependability is the ability to provide services that can be trusted within a time-period. The service guarantees must hold even when the system is subject to attacks or natural failures.
A mission critical factor of a system is any factor that is essential to business, organizational, or governmental operations. Failure or disruption of mission critical factors will result in serious impact on business, organization, or government operations, and even can cause social turmoil and catastrophes.
A built-in self-test (BIST) or built-in test (BIT) is a mechanism that permits a machine to test itself. Engineers design BISTs to meet requirements such as:
Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability is defined as the probability that a product, system, or service will perform its intended function adequately for a specified period of time, OR will operate in a defined environment without failure. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.
Fault tolerance is the ability of a system to maintain proper operation despite failures or faults in one or more of its components. This capability is essential for high-availability, mission-critical, or even life-critical systems.
DO-178B, Software Considerations in Airborne Systems and Equipment Certification is a guideline dealing with the safety of safety-critical software used in certain airborne systems. It was jointly developed by the safety-critical working group RTCA SC-167 of the Radio Technical Commission for Aeronautics (RTCA) and WG-12 of the European Organisation for Civil Aviation Equipment (EUROCAE). RTCA published the document as RTCA/DO-178B, while EUROCAE published the document as ED-12B. Although technically a guideline, it was a de facto standard for developing avionics software systems until it was replaced in 2012 by DO-178C.
In functional safety, safety integrity level (SIL) is defined as the relative level of risk-reduction provided by a safety instrumented function (SIF), i.e. the measurement of the performance required of the SIF.
IEC 61508 is an international standard published by the International Electrotechnical Commission (IEC) consisting of methods on how to apply, design, deploy and maintain automatic protection systems called safety-related systems. It is titled Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems.
IEC standard 61511 is a technical standard which sets out practices in the engineering of systems that ensure the safety of an industrial process through the use of instrumentation. Such systems are referred to as Safety Instrumented Systems. The title of the standard is "Functional safety - Safety instrumented systems for the process industry sector".
Software safety is an engineering discipline that aims to ensure that software, which is used in safety-related systems, does not contribute to any hazards such a system might pose. There are numerous standards that govern the way how safety-related software should be developed and assured in various domains. Most of them classify software according to their criticality and propose techniques and measures that should be employed during the development and assurance:
Medical equipment management is a term for the professionals who manage operations, analyze and improve utilization and safety, and support servicing healthcare technology. These healthcare technology managers are, much like other healthcare professionals referred to by various specialty or organizational hierarchy names.
Functional safety is the part of the overall safety of a system or piece of equipment that depends on automatic protection operating correctly in response to its inputs or failure in a predictable manner (fail-safe). The automatic protection system should be designed to properly handle likely systematic errors, hardware failures and operational/environmental stress.
ISO 26262, titled "Road vehicles – Functional safety", is an international standard for functional safety of electrical and/or electronic systems that are installed in serial production road vehicles, defined by the International Organization for Standardization (ISO) in 2011, and revised in 2018.
Partial stroke testing is a technique used in a control system to allow the user to test a percentage of the possible failure modes of a shut down valve without the need to physically close the valve. PST is used to assist in determining that the safety function will operate on demand. PST is most often used on high integrity emergency shutdown valves (ESDVs) in applications where closing the valve will have a high cost burden yet proving the integrity of the valve is essential to maintaining a safe facility. In addition to ESDVs PST is also used on high integrity pressure protection systems or HIPPS. Partial stroke testing is not a replacement for the need to fully stroke valves as proof testing is still a mandatory requirement.
AC 25.1309–1 is an FAA Advisory Circular (AC) that identifies acceptable means for showing compliance with the airworthiness requirements of § 25.1309 of the Federal Aviation Regulations. Revision A was released in 1988. In 2002, work was done on Revision B, but it was not formally released; the result is the Rulemaking Advisory Committee-recommended revision B-Arsenal Draft (2002). The Arsenal Draft is "considered to exist as a relatively mature draft". The FAA and EASA have subsequently accepted proposals by type certificate applicants to use the Arsenal Draft on development programs.
High-integrity software is software whose failure may cause serious damage with possible "life-threatening consequences." "Integrity is important as it demonstrates the safety, security, and maintainability of... code." Examples of high-integrity software are nuclear reactor control, avionics software, automotive safety-critical software and process control software.
[H]igh integrity means that the code:
Archived 2020-07-15 at the Wayback Machine