SAPHIRE

Last updated

SAPHIRE is a probabilistic risk and reliability assessment software tool. SAPHIRE stands for Systems Analysis Programs for Hands-on Integrated Reliability Evaluations. The system was developed for the U.S. Nuclear Regulatory Commission (NRC) by the Idaho National Laboratory.

Contents

Development began in the mid-1980s when the NRC began exploring two notions: 1) that Probabilistic Risk Assessment (PRA) information could be displayed and manipulated using the emerging microcomputer technology of the day and 2) the rapid advancement of PRA technology required a relatively inexpensive and readily available platform for teaching PRA concepts to students.

The history of SAPHIRE

1987 Version 1 of the code called IRRAS (now known as SAPHIRE) introduced an innovative way to draw, edit, and analyze graphical fault trees.

1989 Version 2 is released incorporating the ability to draw, edit, and analyze graphical event trees.

1990 Analysis improvements to IRRAS led to the release of Version 4 and the formation of the IRRAS Users Group.

1992 Creation of 32-bit IRRAS, Version 5, resulted in an order-of-magnitude decrease in analysis time. New features included: end state analysis; fire, flood, and seismic modules; rule-base cut set processing; and rule-based fault tree to event tree linking.

1997 SAPHIRE for Windows, version 6.x, is released. Use of a Windows user-interface makes SAPHIRE easy to learn. The new "plug-in" feature allows analysts to expand on the built-in probability calculations.

1999 SAPHIRE for Windows, version 7.x, is released. Enhancements are made to the event tree "linking rules" and to the use of dual language capability inside the SAPHIRE database.

SAPHIRE 8 SAPHIRE 8.jpg
SAPHIRE 8

2005 SAPHIRE for Windows, version 8.x, undergoes development.

2008 SAPHIRE for Windows, version 8.x, release as a beta version.

2010 SAPHIRE for Windows, version 8.x, release for U.S. Government and industry use.

The evolution of software and related analysis methods has led to the current generation of the SAPHIRE tool. The current SAPHIRE software code-base started in the mid-1980s as part of the NRC's general risk activities. In 1986, work commenced on the precursor to the SAPHIRE software – this software package was named the Integrated Reliability and Risk Analysis System, or IRRAS. IRRAS was the first IBM compatible PC-based risk analysis tool developed at the Idaho National Laboratory, thereby allowing users to work in a graphical interface rather than with mainframe punch cards. While limited to the analysis of only fault trees of medium size, version 1 of IRRAS was the initial step in the progress that today has led to the SAPHIRE software, software that is capable of running on multiple processors simultaneously and is able to handle extremely large analyses.

NASA use

Historically, NASA relied on worst-case Failure mode and effects analysis for safety assessment. However, this approach has problems, such as it is qualitative and does not aggregate risk at a system or mission level. On October 29, 1986, the investigation of the Challenger accident criticized NASA for not “estimating the probability of failure of the various [Shuttle] elements.” Further, in January 1988, the Post-Challenger investigation recommended that “probabilistic risk assessment approaches be applied to the Shuttle risk management program."

Consequently, probabilistic methods are now being used at NASA. Specifically, the following projects have all used the SAPHIRE software as the primary analysis tool for risk:

Advanced analysis

SAPHIRE contains an advanced minimal cut set solving engine. This solver, which has been fine tuned and optimized over time, has a variety of techniques for analysis, including:

Use of these and other optimization methods has resulted in SAPHIRE having one of the most powerful analysis engines in use for probabilistic risk assessment today.

Basic event probabilities

General basic event probability capabilities for SAPHIRE include:

SAPHIRE has been designed to handle large fault trees, where a tree may have up to 64,000 basic events and gates. To handle the fault trees, two mechanisms for developing and modifying the fault tree are available – a graphical editor and a hierarchical logic editor. Analysts may use either editor; if the logic is modified SAPHIRE can redraw the fault tree graphic. Conversely, if the user modifies the fault tree graphic, SAPHIRE automatically updates the associated logic. Applicable objects available in the fault tree editors include basic events and several gate types, including: OR, AND, NOR, NAND, and N-of-M. In addition to these objects, SAPHIRE has a unique feature known as “table events” that allows the user to group up to eight basic events together on the fault tree graphic, thereby compacting the size of the fault tree on the printed page or computer screen. All of these objects though represent traditional static-type Boolean logic models. Models explicitly capturing dynamic or time-dependent situations are not available in current versions of SAPHIRE.

Related Research Articles

Safety engineering Engineering discipline which assures that engineered systems provide acceptable levels of safety

Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail.

Fault tree analysis Failure analysis system used in safety engineering and reliability engineering

Fault tree analysis (FTA) is a top-down, deductive failure analysis in which an undesired state of a system is analyzed using Boolean logic to combine a series of lower-level events. This analysis method is mainly used in safety engineering and reliability engineering to understand how systems can fail, to identify the best ways to reduce risk and to determine event rates of a safety accident or a particular system level (functional) failure. FTA is used in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other high-hazard industries; but is also used in fields as diverse as risk factor identification relating to social service system failure. FTA is also used in software engineering for debugging purposes and is closely related to cause-elimination technique used to detect bugs.

Safety-critical system System whose failure or malfunction may result in death, injury or damage to equipment or the environment

A safety-critical system (SCS) or life-critical system is a system whose failure or malfunction may result in one of the following outcomes:

Failure mode and effects analysis is the process of reviewing as many components, assemblies, and subsystems as possible to identify potential failure modes in a system and their causes and effects. For each component, the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet. There are numerous variations of such worksheets. An FMEA can be a qualitative analysis, but may be put on a quantitative basis when mathematical failure rate models are combined with a statistical failure mode ratio database. It was one of the first highly structured, systematic techniques for failure analysis. It was developed by reliability engineers in the late 1950s to study problems that might arise from malfunctions of military systems. An FMEA is often the first step of a system reliability study.

Human reliability is related to the field of human factors and ergonomics, and refers to the reliability of humans in fields including manufacturing, medicine and nuclear power. Human performance can be affected by many factors such as age, state of mind, physical health, attitude, emotions, propensity for certain common mistakes, errors and cognitive biases, etc.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

Probabilistic risk assessment (PRA) is a systematic and comprehensive methodology to evaluate risks associated with a complex engineered technological entity or the effects of stressors on the environment for example.

WASH-1400, 'The Reactor Safety Study', was a report produced in 1975 for the Nuclear Regulatory Commission by a committee of specialists under Professor Norman Rasmussen. It "generated a storm of criticism in the years following its release". In the years immediately after its release, WASH-1400 was followed by a number of reports that either peer reviewed its methodology or offered their own judgments about probabilities and consequences of various events at commercial reactors. In at least a few instances, some offered critiques of the study's assumptions, methodology, calculations, peer review procedures, and objectivity. A succession of reports, including NUREG-1150, the State-of-the-Art Reactor Consequence Analyses and others, have carried-on the tradition of PRA and its application to commercial power plants.

A hazard analysis is used as the first step in a process used to assess risk. The result of a hazard analysis is the identification of different type of hazards. A hazard is a potential condition and exists or not. It may in single existence or in combination with other hazards and conditions become an actual Functional Failure or Accident (Mishap). The way this exactly happens in one particular sequence is called a scenario. This scenario has a probability of occurrence. Often a system has many potential failure scenarios. It also is assigned a classification, based on the worst case severity of the end condition. Risk is the combination of probability and severity. Preliminary risk levels can be provided in the hazard analysis. The validation, more precise prediction (verification) and acceptance of risk is determined in the Risk assessment (analysis). The main goal of both is to provide the best selection of means of controlling or eliminating the risk. The term is used in several engineering specialties, including avionics, chemical process safety, safety engineering, reliability engineering and food safety.

ARP4761, Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment is an Aerospace Recommended Practice from SAE International. In conjunction with ARP4754, ARP4761 is used to demonstrate compliance with 14 CFR 25.1309 in the U.S. Federal Aviation Administration (FAA) airworthiness regulations for transport category aircraft, and also harmonized international airworthiness regulations such as European Aviation Safety Agency (EASA) CS–25.1309.

Failure mode effects and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA).

IEC 61508 is an international standard published by the International Electrotechnical Commission consisting of methods on how to apply, design, deploy and maintain automatic protection systems called safety-related systems. It is titled Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems.

NESSUS is a general-purpose, probabilistic analysis program that simulates variations and uncertainties in loads, geometry, material behavior and other user-defined inputs to compute probability of failure and probabilistic sensitivity measures of engineered systems. Because NESSUS uses highly efficient and accurate probabilistic analysis methods, probabilistic solutions can be obtained even for extremely large and complex models. The system performance can be hierarchically decomposed into multiple smaller models and/or analytical equations. Once the probabilistic response is quantified, the results can be used to support risk-informed decisions regarding reliability for safety critical and one-of-a-kind systems, and to maintain a level of quality while reducing manufacturing costs for larger quantity products.

An event tree is an inductive analytical diagram in which an event is analyzed using Boolean logic to examine a chronological series of subsequent events or consequences. For example, event tree analysis is a major component of nuclear reactor safety engineering.

The technique for human error-rate prediction (THERP) is a technique used in the field of human reliability assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA: error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications: first-generation techniques and second-generation techniques. First-generation techniques work on the basis of the simple dichotomy of ‘fits/doesn’t fit’ in matching an error situation in context with related error identification and quantification. Second generation techniques are more theory-based in their assessment and quantification of errors. ‘HRA techniques have been utilised for various applications in a range of disciplines and industries including healthcare, engineering, nuclear, transportation and business.

GoldSim is dynamic, probabilistic simulation software developed by GoldSim Technology Group. This general-purpose simulator is a hybrid of several simulation approaches, combining an extension of system dynamics with some aspects of discrete event simulation, and embedding the dynamic simulation engine within a Monte Carlo simulation framework.

Risk management tools allow the uncertainty to be addressed by identifying and generating metrics, parameterizing, prioritizing, and developing responses, and tracking risk. These activities may be difficult to track without tools and techniques, documentation and information systems.

Ecolego

Ecolego is a simulation software tool that is used for creating dynamic models and performing deterministic and probabilistic simulations. It is also used for conducting risk assessments of complex dynamic systems evolving over time.

Event tree analysis (ETA) is a forward, top-down, logical modeling technique for both success and failure that explores responses through a single initiating event and lays a path for assessing probabilities of the outcomes and overall system analysis. This analysis technique is used to analyze the effects of functioning or failed systems given that an event has occurred. ETA is a powerful tool that will identify all consequences of a system that have a probability of occurring after an initiating event that can be applied to a wide range of systems including: nuclear power plants, spacecraft, and chemical plants. This technique may be applied to a system early in the design process to identify potential issues that may arise, rather than correcting the issues after they occur. With this forward logic process, use of ETA as a tool in risk assessment can help to prevent negative outcomes from occurring, by providing a risk assessor with the probability of occurrence. ETA uses a type of modeling technique called event tree, which branches events from one single event using Boolean logic.