Technique for human error-rate prediction

Last updated

The Technique for human error-rate prediction (THERP) is a technique that is used in the field of Human Reliability Assessment (HRA) to evaluate the probability of human error occurring throughout the completion of a task. From such an analysis (after calculating a probability of human error in a given task), some corrective measures could be taken to reduce the likelihood of errors occurring within a system. The overall goal of THERP is to apply and document probabilistic methodological analyses to increase safety during a given process. THERP is used in fields such as error identification, error quantification and error reduction. [1]

Contents

Techniques

THERP may refer to a number of techniques, which are split into one of two classifications: first-generation techniques and second-generation techniques. First-generation techniques are based on a simple dichotomy, or a dichotomous structure, of whether the technique fits an error situation in the related error identification and quantification of consideration. Second-generation techniques are more theoretical in their assessment and quantification of errors, addressing, rather, the schematic’s situational or interactive elements. HRA techniques are utilized for various applications in a range of disciplines and industries including healthcare, engineering, nuclear power, transportation, and business.

THERP models human error probabilities (HEPs) using a fault-tree approach (similar to an engineering risk assessment), which integrate & account for performance-shaping factors that may influence these probabilities. The probabilities for the human reliability analysis event tree (HRAET), for example, are a calculative assessment tool drawn from a database developed by authors Alan D. Swain and H. E. Guttmann. Local data from simulations or accident reports may be used instead if supplemental data may deepen the examination of human-related error. The resultant tree portrays a step-by-step account of the stages involved in a task, in a logical order. The technique is known as a total methodology [2] because it simultaneously manages many different activities, including task analysis, error identification, and representation in the form of HRAET and HEP quantification.

Background

THERP is a first-generation methodology, which means that its procedures follow the way conventional reliability analysis models a machine. [3] The technique was developed in the Sandia Laboratories for the US Nuclear Regulatory Commission. [4] Its primary author is Swain, who developed the THERP methodology gradually over a lengthy period. [2] THERP relies on a large human reliability database that contains HEPs and is based upon both plant data and expert judgments. The technique was the first approach in HRA to come into broad use and is still widely used in a range of applications even beyond its original nuclear setting.

THERP methodology

The methodology for the THERP technique is broken down into 5 main stages:

1. Define the system failures of interest

These failures include functions of the system where human error has a greater likelihood of influencing the probability of a fault, and those of interest to the risk assessor; operations in which there may be no interest include those not operationally critical or those for which there already exist safety countermeasures.

2. List and analyse the related human operations, and identify human errors that can occur and relevant human error recovery modes

This stage of the process necessitates a comprehensive task and human error analysis. The task analysis lists and sequences the discrete elements and information required by task operators. For each step of the task, possible errors are considered by the analyst and precisely defined. The possible errors are then considered by the analyst, for each task step. Such errors can be broken down into the following categories:

The opportunity for error recovery must also be considered as this, if achieved, has the potential to drastically reduce error probability for a task.

The tasks and associated outcomes are input to an HRAET in order to provide a graphical representation of a task’s procedure. The trees’ compatibility with conventional event-tree methodology i.e. including binary decision points at the end of each node, allows it to be evaluated mathematically.

An event tree visually displays all events that occur within a system. It starts off with an initiating event, then branches develop as various consequences of the starting event. These are represented in a number of different paths, each associated with a probability of occurrence. As mentioned previously, the tree works on a binary logic, so each event either succeeds or fails.

Below is an example of an event tree that represents a system fire:

Event Tree SVG remake.svg

Under the condition that all of a task’s sub-tasks are fully represented within an HRAET and the failure probability for each sub-task is known it is possible to calculate the final reliability for the task.

3. Estimate the relevant error probabilities

HEPs for each sub-task are entered into the tree; all failure branches must have a known probability, otherwise the system will fail to provide a final answer. HRAETs provide the function of breaking down the primary operator tasks into finer steps, which are represented in the form of successes and failures. This tree indicates the order in which the events occur and also considers likely failures that may occur at each of the represented branches. The degree to which each high-level task is broken down into lower-level tasks is dependent on the availability of HEPs for the successive individual branches. The HEPs may be derived from a range of sources such as the THERP database; simulation data; historical accident data, and expert judgment. PSFs should be incorporated into these HEP calculations; the primary source of guidance for this is the THERP handbook. However, the analyst must use their own discretion when deciding the extent to which each of the factors applies to the task.

4. Estimate the effects of human error on the system failure events

With the completion of the HRA, the human contribution to failure can then be assessed in comparison with the results of the overall reliability analysis. This can be completed by inserting the HEPs into the full system’s fault event tree, which allows human factors to be considered within the context of the full system.

5. Recommend changes to the system and recalculate the system failure probabilities

Once the human factor contribution is known, sensitivity analysis can be used to identify how HEPs can be reduced. Error recovery paths may be incorporated into the event tree as this will aid the assessor when considering the possible approaches by which the identified errors can be reduced.

Worked example

Context

The following example illustrates how the THERP methodology can be used in practice in the calculation of human error probabilities (HEPs). It is used to determine the HEP for establishing air-based ventilation using emergency purge ventilation equipment on in-tank precipitation (ITP) processing tanks 48 and 49 after failure of the nitrogen purge system following a seismic event.

Assumptions

In order for the final HEP calculation to be valid, the following assumptions are required to be fulfilled:

  1. There exists a seismic event initiator that leads to the establishment of air-based ventilation on the ITP processing tanks 48 and 49, possibly 50 in some cases.
  2. It is assumed that both on and offsite power is unavailable within the context and therefore control actions performed by the operator are done so locally, on the tank top
  3. The time available for operations personnel to establish air-based ventilation by use of the emergency purge ventilation, following the occurrence of the seismic event, is a duration of 3 days
  4. There is a necessity for an ITP equipment status monitoring procedure to be developed to allow for a consistent method to be adopted for the purposes of evaluating the ITP equipment and component status and selected process parameters for the period of an accident condition
  5. Assumed response times exist for the initial diagnosis of the event and for the placement of emergency purge ventilation equipment on the tank top. The former is 10 hours while the latter is 4 hours.
  6. The in-tank precipitation process has associated operational safety requirements (OSR) that identify the precise conditions under which the emergency purge ventilation equipment should be hooked up to the riser
  7. The “tank 48 system” standard operating procedure has certain conditions and actions that must be included for correct completion to be performed (see file for more details)
  8. A vital component of the emergency purge ventilation equipment unit is a flow indicator; this is required in the event of the emergency purge ventilation equipment being hooked up incorrectly as it would allow for a recovery action
  9. The personnel available to perform the necessary tasks all possess the required skills
  10. Throughout the installation of the emergency purge ventilation equipment, carried out by maintenance personnel, a tank operator must be present to monitor this process.

Method

The method considers various factors that may contribute to human errors and provides a systematic approach for evaluating and quantifying these probabilities.

Here are the key steps involved in the THERP method:

Task Analysis: The first step is to break down the overall task into discrete steps or stages. Each stage represents a specific activity or action performed by the human operator.

Error Identification: For each task stage, potential human errors are identified. These errors can result from a variety of factors, such as misinterpretation, distraction, or memory lapses.

Error Quantification: The next step is to assign probabilities to each identified error. These probabilities are based on historical data, expert judgment, or other relevant sources. THERP often uses a database of generic human error probabilities for different types of tasks.

Calculation of Overall Error Probability: The overall error probability for a task is calculated by combining the probabilities of individual errors at each stage. The method considers both independent and dependent errors, recognizing that the occurrence of one error may influence the likelihood of others.

Sensitivity Analysis: THERP allows for sensitivity analysis, which involves assessing the impact of variations in error probabilities on the overall result. This helps identify which factors have the most significant influence on the predicted human error rate.

Documentation and Reporting: The final step involves documenting the analysis, including the task breakdown, identified errors, assigned probabilities, and the overall predicted human error rate. This information is crucial for decision-makers and system designers.

THERP is widely used in industries where human performance is critical, such as nuclear power, aviation, and chemical processing. While THERP provides a systematic framework for human error prediction, it's important to note that the method relies on expert judgment and historical data, and its accuracy can be influenced by the quality of the input data and the expertise of the analysts.

Keep in mind that other HRA methods, such the as Human Error Assessment and Reduction Technique (HEART) and Bayesian Network-based approaches, also exist, and the choice of method depends on the specific requirements and characteristics of the system being analyzed.

An initial task analysis was carried out on the normal procedure and standard operating procedure. This allowed the operator to align and then initiate the emergency purge ventilation equipment given the loss of the ventilation system. Thereafter, each individual task was analyzed from which it was then possible to assign error probabilities and error factors to events that represented operator responses.

Event Tree Worked Example.jpg

HRA event tree for aligning and starting emergency purge ventilation equipment on in-tank precipitation tanks 48 or 49 after a seismic event.

The summation of each of the failure path probabilities provided the total failure path probability (FT)

Results

From the various figures and workings, it can be determined that the HEP for establishing air-based ventilation using the emergency purge ventilation equipment on In-tank Precipitation processing tanks 48 and 49 after a failure of the nitrogen purge system following a seismic event is 4.2 E-6. This numerical value is judged to be a median value on the lognormal scale. However, this result is only valid given that all the previously stated assumptions are implemented.

Advantages of THERP

Disadvantages of THERP

Other human reliability assessments

Other Human Reliability Assessments (HRA) have been created by multiple different researchers. They include cognitive reliability and error analysis method (CREAM), technique for human error assessment (THEA), cause-based decision tree (CBDT), human error repository and analysis (HERA), standardized plant analysis risk (SPAR), a technique for human error analysis (ATHEANA), hazard and operability study (HAZOP), system for predictive error analysis and reduction (SPEAR), and human error assessment and reduction technique (HEART). [8]

Related Research Articles

<span class="mw-page-title-main">Safety engineering</span> Engineering discipline which assures that engineered systems provide acceptable levels of safety

Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail.

<span class="mw-page-title-main">Fault tree analysis</span> Failure analysis system used in safety engineering and reliability engineering

Fault tree analysis (FTA) is a type of failure analysis in which an undesired state of a system is examined. This analysis method is mainly used in safety engineering and reliability engineering to understand how systems can fail, to identify the best ways to reduce risk and to determine event rates of a safety accident or a particular system level (functional) failure. FTA is used in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other high-hazard industries; but is also used in fields as diverse as risk factor identification relating to social service system failure. FTA is also used in software engineering for debugging purposes and is closely related to cause-elimination technique used to detect bugs.

<span class="mw-page-title-main">Safety-critical system</span> System whose failure would be serious

A safety-critical system or life-critical system is a system whose failure or malfunction may result in one of the following outcomes:

Failure mode and effects analysis is the process of reviewing as many components, assemblies, and subsystems as possible to identify potential failure modes in a system and their causes and effects. For each component, the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet. There are numerous variations of such worksheets. An FMEA can be a qualitative analysis, but may be put on a quantitative basis when mathematical failure rate models are combined with a statistical failure mode ratio database. It was one of the first highly structured, systematic techniques for failure analysis. It was developed by reliability engineers in the late 1950s to study problems that might arise from malfunctions of military systems. An FMEA is often the first step of a system reliability study.

In the field of human factors and ergonomics, human reliability is the probability that a human performs a task to a sufficient standard. Reliability of humans can be affected by many factors such as age, physical health, mental state, attitude, emotions, personal propensity for certain mistakes, and cognitive biases.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

Probabilistic risk assessment (PRA) is a systematic and comprehensive methodology to evaluate risks associated with a complex engineered technological entity or the effects of stressors on the environment.

A hazard analysis is used as the first step in a process used to assess risk. The result of a hazard analysis is the identification of different types of hazards. A hazard is a potential condition and exists or not. It may, in single existence or in combination with other hazards and conditions, become an actual Functional Failure or Accident (Mishap). The way this exactly happens in one particular sequence is called a scenario. This scenario has a probability of occurrence. Often a system has many potential failure scenarios. It also is assigned a classification, based on the worst case severity of the end condition. Risk is the combination of probability and severity. Preliminary risk levels can be provided in the hazard analysis. The validation, more precise prediction (verification) and acceptance of risk is determined in the risk assessment (analysis). The main goal of both is to provide the best selection of means of controlling or eliminating the risk. The term is used in several engineering specialties, including avionics, food safety, occupational safety and health, process safety, reliability engineering.

<span class="mw-page-title-main">ARP4761</span>

ARP4761, Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment is an Aerospace Recommended Practice from SAE International. In conjunction with ARP4754, ARP4761 is used to demonstrate compliance with 14 CFR 25.1309 in the U.S. Federal Aviation Administration (FAA) airworthiness regulations for transport category aircraft, and also harmonized international airworthiness regulations such as European Aviation Safety Agency (EASA) CS–25.1309.

Human error is an action that has been done but that was "not intended by the actor; not desired by a set of rules or an external observer; or that led the task or system outside its acceptable limits". Human error has been cited as a primary cause contributing factor in disasters and accidents in industries as diverse as nuclear power, aviation, space exploration, and medicine. Prevention of human error is generally seen as a major contributor to reliability and safety of (complex) systems. Human error is one of the many contributing causes of risk events.

Absolute probability judgement is a technique used in the field of human reliability assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of 'fits/doesn't fit' in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. 'HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

Human Cognitive Reliability Correlation (HCR) is a technique used in the field of Human reliability Assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of ‘fits/doesn’t fit’ in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

Tecnica Empirica Stima Errori Operatori (TESEO) is a technique in the field of Human reliability Assessment (HRA), that evaluates the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of ‘fits/doesn’t fit’ in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. ‘HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

Human error assessment and reduction technique (HEART) is a technique used in the field of human reliability assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA: error identification, error quantification, and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications: first-generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of 'fits/doesn't fit' in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. HRA techniques have been used in a range of industries including healthcare, engineering, nuclear, transportation, and business sectors. Each technique has varying uses within different disciplines.

Success Likelihood Index Method (SLIM) is a technique used in the field of Human reliability Assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of ‘fits/doesn’t fit’ in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. ‘HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

Influence Diagrams Approach (IDA) is a technique used in the field of Human reliability Assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of ‘fits/doesn’t fit’ in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. ‘HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

A Technique for Human Event Analysis (ATHEANA) is a technique used in the field of human reliability assessment (HRA). The purpose of ATHEANA is to evaluate the probability of human error while performing a specific task. From such analyses, preventative measures can then be taken to reduce human errors within a system and therefore lead to improvements in the overall level of safety.

Human factors are the physical or cognitive properties of individuals, or social behavior which is specific to humans, and influence functioning of technological systems as well as human-environment equilibria. The safety of underwater diving operations can be improved by reducing the frequency of human error and the consequences when it does occur. Human error can be defined as an individual's deviation from acceptable or desirable practice which culminates in undesirable or unexpected results.

Dive safety is primarily a function of four factors: the environment, equipment, individual diver performance and dive team performance. The water is a harsh and alien environment which can impose severe physical and psychological stress on a diver. The remaining factors must be controlled and coordinated so the diver can overcome the stresses imposed by the underwater environment and work safely. Diving equipment is crucial because it provides life support to the diver, but the majority of dive accidents are caused by individual diver panic and an associated degradation of the individual diver's performance. - M.A. Blumenberg, 1996

ISO/IEC 31010 is a standard concerning risk management codified by The International Organization for Standardization and The International Electrotechnical Commission (IEC). The full name of the standard is ISO.IEC 31010:2019 – Risk management – Risk assessment techniques.

Event tree analysis (ETA) is a forward, top-down, logical modeling technique for both success and failure that explores responses through a single initiating event and lays a path for assessing probabilities of the outcomes and overall system analysis. This analysis technique is used to analyze the effects of functioning or failed systems given that an event has occurred.

References

  1. Calixto, Eduardo (2016-01-01), Calixto, Eduardo (ed.), "Chapter 5 - Human Reliability Analysis", Gas and Oil Reliability Engineering (Second Edition), Boston: Gulf Professional Publishing, pp. 471–552, doi:10.1016/b978-0-12-805427-7.00005-1, ISBN   978-0-12-805427-7 , retrieved 2023-12-20
  2. 1 2 Kirwan, B. (1994) A Guide to Practical Human Reliability Assessment. CRC Press. ISBN   978-0748400522.
  3. 1 2 Hollnagel, E. (2005) Human reliability assessment in context. Nuclear Engineering and Technology. 37(2). pp. 159-166.
  4. Swain, A.D. & Guttmann, H.E., Handbook of Human Reliability Analysis with Emphasis on Nuclear Power Plant Applications. 1983, NUREG/CR-1278, USNRC.
  5. 1 2 3 4 5 Humphreys, P. (1995). Human Reliability Assessor’s Guide. Human Factors in Reliability Group. ISBN 0853564205
  6. Kirwan, B. (1996) The validation of three human reliability quantification techniques - THERP, HEART, JHEDI: Part I -- technique descriptions and validation issues. Applied Ergonomics. 27(6) 359-373. doi.org/10.1016/S0003-6870(96)00044-0
  7. Kirwan, B. (1997) The validation of three human reliability quantification techniques - THERP, HEART, JHEDI: Part II - Results of validation exercise. Applied Ergonomics. 28(1) 17-25.
  8. DeMott, D.L. (2014?) "Human Reliability and the Cost of Doing Business". Annual Maintenance and Reliability Symposium