System safety

Last updated

The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application of remedial controls using a systems-based approach. [1] This is different from traditional safety strategies which rely on control of conditions and causes of an accident based either on the epidemiological analysis or as a result of investigation of individual past accidents. [2] The concept of system safety is useful in demonstrating adequacy of technologies when difficulties are faced with probabilistic risk analysis. [3] The underlying principle is one of synergy: a whole is more than sum of its parts. Systems-based approach to safety requires the application of scientific, technical and managerial skills to hazard identification, hazard analysis, and elimination, control, or management of hazards throughout the life-cycle of a system, program, project or an activity or a product. [1] "Hazop" is one of several techniques available for identification of hazards.

Contents

System approach

A system is defined as a set or group of interacting, interrelated or interdependent elements or parts, that are organized and integrated to form a collective unity or a unified whole, to achieve a common objective. [4] [5] This definition lays emphasis on the interactions between the parts of a system and the external environment to perform a specific task or function in the context of an operational environment. This focus on interactions is to take a view on the expected or unexpected demands (inputs) that will be placed on the system and see whether necessary and sufficient resources are available to process the demands. These might take form of stresses. These stresses can be either expected, as part of normal operations, or unexpected, as part of unforeseen acts or conditions that produce beyond-normal (i.e., abnormal) stresses. This definition of a system, therefore, includes not only the product or the process but also the influences that the surrounding environment (including human interactions) may have on the product’s or process’s safety performance. Conversely, system safety also takes into account the effects of the system on its surrounding environment. Thus, a correct definition and management of interfaces becomes very important. [4] [5] Broader definitions of a system are the hardware, software, human systems integration, procedures and training. Therefore, system safety as part of the systems engineering process should systematically address all of these domains and areas in engineering and operations in a concerted fashion to prevent, eliminate and control hazards.

A “system", therefore, has implicit as well as explicit definition of boundaries to which the systematic process of hazard identification, hazard analysis and control is applied. The system can range in complexity from a crewed spacecraft to an autonomous machine tool. The system safety concept helps the system designer(s) to model, analyse, gain awareness about, understand and eliminate the hazards, and apply controls to achieve an acceptable level of safety. Ineffective decision making in safety matters is regarded as the first step in the sequence of hazardous flow of events in the "Swiss cheese" model of accident causation. [6] Communications regarding system risk have an important role to play in correcting risk perceptions by creating, analysing and understanding information model to show what factors create and control the hazardous process. [3] For almost any system, product, or service, the most effective means of limiting product liability and accident risks is to implement an organized system safety function, beginning in the conceptual design phase and continuing through to its development, fabrication, testing, production, use and ultimate disposal. The aim of the system safety concept is to gain assurance that a system and associated functionality behaves safely and is safe to operate. This assurance is necessary. Technological advances in the past have produced positive as well as negative effects. [1]

Root cause analysis

A root cause analysis identifies the set of multiple causes that together might create a potential accident. Root cause techniques have been successfully borrowed from other disciplines and adapted to meet the needs of the system safety concept, most notably the tree structure from fault tree analysis, which was originally an engineering technique. [7] The root cause analysis techniques can be categorised into two groups: a) tree techniques, and b) check list methods. There are several root causal analysis techniques, e.g. Management Oversight and Risk Tree (MORT) analysis. [2] [8] [9] Others are Event and Causal Factor Analysis (ECFA), Multilinear Events Sequencing, Sequentially Timed Events Plotting Procedure, and Savannah River Plant Root Cause Analysis System. [7]

Use in other fields

Safety engineering

Safety engineering describes some methods used in nuclear and other industries. Traditional safety engineering techniques are focused on the consequences of human error and do not investigate the causes or reasons for the occurrence of human error. System safety concept can be applied to this traditional field to help identify the set of conditions for safe operation of the system. Modern and more complex systems in military and NASA with computer application and controls require functional hazard analyses and a set of detailed specifications at all levels that address safety attributes to be inherent in the design. The process following a system safety program plan, preliminary hazard analyses, functional hazard assessments and system safety assessments are to produce evidence based documentation that will drive safety systems that are certifiable and that will hold up in litigation. The primary focus of any system safety plan, hazard analysis and safety assessment is to implement a comprehensive process to systematically predict or identify the operational behavior of any safety-critical failure condition or fault condition or human error that could lead to a hazard and potential mishap. This is used to influence requirements to drive control strategies and safety attributes in the form of safety design features or safety devices to prevent, eliminate and control (mitigation) safety risk. In the distant past hazards were the focus for very simple systems, but as technology and complexity advanced in the 1970s and 1980s more modern and effective methods and techniques were invented using holistic approaches. Modern system safety is comprehensive and is risk based, requirements based, functional based and criteria based with goal structured objectives to yield engineering evidence to verify safety functionality is deterministic and acceptable risk in the intended operating environment. Software intensive systems that command, control and monitor safety-critical functions require extensive software safety analyses to influence detail design requirements, especially in more autonomous or robotic systems with little or no operator intervention. Systems of systems, such as a modern military aircraft or fighting ship with multiple parts and systems with multiple integration, sensor fusion, networking and interoperable systems will require much partnering and coordination with multiple suppliers and vendors responsible for ensuring safety is a vital attribute planned in the overall system.

Weapon system safety

Weapon System Safety is an important application of the system safety field, due to the potentially destructive effects of a system failure or malfunction. A healthy skeptical attitude towards the system, when it is at the requirements definition and drawing-board stage, by conducting functional hazard analyses, would help in learning about the factors that create hazards and mitigations that control the hazards. A rigorous process is usually formally implemented as part of systems engineering to influence the design and improve the situation before the errors and faults weaken the system defences and cause accidents. [1] [2] [3] [4]

Typically, weapons systems pertaining to ships, land vehicles, guided missiles and aircraft differ in hazards and effects; some are inherent, such as explosives, and some are created due to the specific operating environments (as in, for example, aircraft sustaining flight). In the military aircraft industry safety-critical functions are identified and the overall design architecture of hardware, software and human systems integration are thoroughly analyzed and explicit safety requirements are derived and specified during proven hazard analysis process to establish safeguards to ensure essential functions are not lost or function correctly in a predictable manner. Conducting comprehensive hazard analyses and determining credible faults, failure conditions, contributing influences and causal factors, that can contribute to or cause hazards, are an essentially part of the systems engineering process. Explicit safety requirements must be derived, developed, implemented, and verified with objective safety evidence and ample safety documentation showing due diligence. Highly complex software intensive systems with many complex interactions affecting safety-critical functions requires extensive planning, special know-how, use of analytical tools, accurate models, modern methods and proven techniques. Prevention of mishaps is the objective.

Related Research Articles

<span class="mw-page-title-main">Risk management</span> Identification, evaluation and control of risks

Risk management is the identification, evaluation, and prioritization of risks followed by coordinated and economical application of resources to minimize, monitor, and control the probability or impact of unfortunate events or to maximize the realization of opportunities.

<span class="mw-page-title-main">Systems engineering</span> Interdisciplinary field of engineering

Systems engineering is an interdisciplinary field of engineering and engineering management that focuses on how to design, integrate, and manage complex systems over their life cycles. At its core, systems engineering utilizes systems thinking principles to organize this body of knowledge. The individual outcome of such efforts, an engineered system, can be defined as a combination of components that work in synergy to collectively perform a useful function.

<span class="mw-page-title-main">Safety engineering</span> Engineering discipline which assures that engineered systems provide acceptable levels of safety

Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail.

<span class="mw-page-title-main">Fault tree analysis</span> Failure analysis system used in safety engineering and reliability engineering

Fault tree analysis (FTA) is a type of failure analysis in which an undesired state of a system is examined. This analysis method is mainly used in safety engineering and reliability engineering to understand how systems can fail, to identify the best ways to reduce risk and to determine event rates of a safety accident or a particular system level (functional) failure. FTA is used in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other high-hazard industries; but is also used in fields as diverse as risk factor identification relating to social service system failure. FTA is also used in software engineering for debugging purposes and is closely related to cause-elimination technique used to detect bugs.

<span class="mw-page-title-main">Safety-critical system</span> System whose failure would be serious

A safety-critical system (SCS) or life-critical system is a system whose failure or malfunction may result in one of the following outcomes:

In science and engineering, root cause analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problems. It is widely used in IT operations, manufacturing, telecommunications, industrial process control, accident analysis, medicine, healthcare industry, etc. Root cause analysis is a form of deductive inference since it requires an understanding of the underlying causal mechanisms of the potential root causes and the problem.

Failure mode and effects analysis is the process of reviewing as many components, assemblies, and subsystems as possible to identify potential failure modes in a system and their causes and effects. For each component, the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet. There are numerous variations of such worksheets. An FMEA can be a qualitative analysis, but may be put on a quantitative basis when mathematical failure rate models are combined with a statistical failure mode ratio database. It was one of the first highly structured, systematic techniques for failure analysis. It was developed by reliability engineers in the late 1950s to study problems that might arise from malfunctions of military systems. An FMEA is often the first step of a system reliability study.

The safety life cycle is the series of phases from initiation and specifications of safety requirements, covering design and development of safety features in a safety-critical system, and ending in decommissioning of that system. This article uses software as the context but the safety life cycle applies to other areas such as construction of buildings, for example. In software development, a process is used and this process consists of a few phases, typically covering initiation, analysis, design, programming, testing and implementation. The focus is to build the software. Some software have safety concerns while others do not. For example, a Leave Application System does not have safety requirements. But we are concerned about safety if a software that is used to control the components in a plane fails. So for the latter, the question is how safety, being so important, should be managed within the software life cycle.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

DO-178B, Software Considerations in Airborne Systems and Equipment Certification is a guideline dealing with the safety of safety-critical software used in certain airborne systems. It was jointly developed by the safety-critical working group RTCA SC-167 of the Radio Technical Commission for Aeronautics (RTCA) and WG-12 of the European Organisation for Civil Aviation Equipment (EUROCAE). RTCA published the document as RTCA/DO-178B, while EUROCAE published the document as ED-12B. Although technically a guideline, it was a de facto standard for developing avionics software systems until it was replaced in 2012 by DO-178C.

Safety integrity level (SIL) is defined as the relative level of risk-reduction provided by a safety function, or to specify a target level of risk reduction. In simple terms, SIL is a measurement of performance required for a safety instrumented function (SIF).

A hazard analysis is used as the first step in a process used to assess risk. The result of a hazard analysis is the identification of different types of hazards. A hazard is a potential condition and exists or not. It may, in single existence or in combination with other hazards and conditions, become an actual Functional Failure or Accident (Mishap). The way this exactly happens in one particular sequence is called a scenario. This scenario has a probability of occurrence. Often a system has many potential failure scenarios. It also is assigned a classification, based on the worst case severity of the end condition. Risk is the combination of probability and severity. Preliminary risk levels can be provided in the hazard analysis. The validation, more precise prediction (verification) and acceptance of risk is determined in the risk assessment (analysis). The main goal of both is to provide the best selection of means of controlling or eliminating the risk. The term is used in several engineering specialties, including avionics, chemical process safety, safety engineering, reliability engineering and food safety.

<span class="mw-page-title-main">ARP4761</span>

ARP4761, Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment is an Aerospace Recommended Practice from SAE International. In conjunction with ARP4754, ARP4761 is used to demonstrate compliance with 14 CFR 25.1309 in the U.S. Federal Aviation Administration (FAA) airworthiness regulations for transport category aircraft, and also harmonized international airworthiness regulations such as European Aviation Safety Agency (EASA) CS–25.1309.

IEC 61508 is an international standard published by the International Electrotechnical Commission consisting of methods on how to apply, design, deploy and maintain automatic protection systems called safety-related systems. It is titled Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems.

In software engineering, software system safety optimizes system safety in the design, development, use, and maintenance of software systems and their integration with safety-critical hardware systems in an operational environment.

Functional safety is the part of the overall safety of a system or piece of equipment that depends on automatic protection operating correctly in response to its inputs or failure in a predictable manner (fail-safe). The automatic protection system should be designed to properly handle likely human errors, systematic errors, hardware failures and operational/environmental stress.

ISO 26262, titled "Road vehicles – Functional safety", is an international standard for functional safety of electrical and/or electronic systems that are installed in serial production road vehicles, defined by the International Organization for Standardization (ISO) in 2011, and revised in 2018.

One definition of a Safety Case is that it is a structured argument, supported by evidence, intended to justify that a system is acceptably safe for a specific application in a specific operating environment. Safety cases are often required as part of a regulatory process, a certificate of safety being granted only when the regulator is satisfied by the argument presented in a safety case. Industries regulated in this way include transportation and medical devices. As such there are strong parallels with the formal evaluation of risk used to prepare a Risk Assessment, although the result will be case specific. A vehicle safety case may show it to be acceptably safe to be driven on a road, but conclude that it may be unsuited to driving on rough ground, or with an off-center load for example, if there would then be a greater risk of danger e.g. a loss of control or an injury to the occupant. The information used to compile the safety case may then formally guarantee further specifications, such as maximum safe speeds, permitted safe loads, or any other operational parameter. A safety case should be revisited when an existing product is to be re-purposed in a new way, if this extends beyond the scope of the original assessment.

Automotive Safety Integrity Level (ASIL) is a risk classification scheme defined by the ISO 26262 - Functional Safety for Road Vehicles standard. This is an adaptation of the Safety Integrity Level (SIL) used in IEC 61508 for the automotive industry. This classification helps defining the safety requirements necessary to be in line with the ISO 26262 standard. The ASIL is established by performing a risk analysis of a potential hazard by looking at the Severity, Exposure and Controllability of the vehicle operating scenario. The safety goal for that hazard in turn carries the ASIL requirements.

<span class="mw-page-title-main">AC 25.1309-1</span> American aviation regulatory document

AC 25.1309–1 is an FAA Advisory Circular (AC) that identifies acceptable means for showing compliance with the airworthiness requirements of § 25.1309 of the Federal Aviation Regulations. Revision A was releases in 1988. In 2002, work was done on Revision B, but it was not formally released; the result is the Rulemaking Advisory Committee-recommended revision B-Arsenal Draft (2002). The Arsenal Draft is "considered to exist as a relatively mature draft". The FAA and EASA have subsequently accepted proposals by type certificate applicants to use the Arsenal Draft on development programs.

References

  1. 1 2 3 4 Harold E. Roland; Brian Moriarty (1990). System Safety Engineering and Management. John Wiley & Sons. ISBN   0471618160.
  2. 1 2 3 Jens Rasmussen, Annelise M. Pejtersen, L. P. Goodstein (1994). Cognitive Systems Engineering. John Wiley & Sons. ISBN   0471011983.{{cite book}}: CS1 maint: multiple names: authors list (link)
  3. 1 2 3 Baruch Fischhoff (1995). Risk Perception and Communication Unplugged : Twenty Years of Process. Risk Analysis, Vol 15, No.2.
  4. 1 2 3 Alexander Kossiakoff; William N.Sweet (2003). System Engineering Principles and Practice. John Wiley & Sons. ISBN   0471234435.
  5. 1 2 Charles S. Wasson (2006). System analysis, design and development. John Wiley & Sons. ISBN   0471393339.
  6. James Reason (1990). Human Error. Ashgate. ISBN   1840141042.
  7. 1 2 UK Health & Safety Executive (2001). Contract Research Report 321, Root Cause Analysis, Literature review. UK HMSO. ISBN   0-717619664.
  8. "The Management Oversight and Risk Tree (MORT)". International Crisis Management Association. Archived from the original on 27 September 2014. Retrieved 1 October 2014.
  9. Entry for MORT on the FAA Human Factors Workbench

Organisations

System safety guidance