Root cause analysis

Last updated

In science and engineering, root cause analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problems. [1] It is widely used in IT operations, manufacturing, telecommunications, industrial process control, accident analysis (e.g., in aviation, [2] rail transport, or nuclear plants), medical diagnosis, the healthcare industry (e.g., for epidemiology), etc. Root cause analysis is a form of inductive inference (first create a theory, or root, based on empirical evidence, or causes) and deductive inference (test the theory, i.e., the underlying causal mechanisms, with empirical data).

Contents

RCA can be decomposed into four steps:

  1. Identify and describe the problem clearly
  2. Establish a timeline from the normal situation until the problem occurrence
  3. Distinguish between the root cause and other causal factors (e.g., via event correlation)
  4. Establish a causal graph between the root cause and the problem.

RCA generally serves as input to a remediation process whereby corrective actions are taken to prevent the problem from recurring. The name of this process varies between application domains. According to ISO/IEC 31010, RCA may include these techniques: Five whys, Failure mode and effects analysis (FMEA), Fault tree analysis, Ishikawa diagrams, and Pareto analysis.

Definitions

There are essentially two ways of repairing faults and solving problems in science and engineering.

Reactive management

Reactive management consists of reacting quickly after the problem occurs, by treating the symptoms. This type of management is implemented by reactive systems, [3] [4] self-adaptive systems, [5] self-organized systems, and complex adaptive systems. The goal here is to react quickly and alleviate the effects of the problem as soon as possible.

Proactive management

Proactive management, conversely, consists of preventing problems from occurring. Many techniques can be used for this purpose, ranging from good practices in design to analyzing in detail problems that have already occurred and taking actions to make sure they never recur. Speed is not as important here as the accuracy and precision of the diagnosis. The focus is on addressing the real cause of the problem rather than its effects.

Root cause analysis is often used in proactive management to identify the root cause of a problem, that is, the factor that was the leading cause. It is customary to refer to the "root cause" in singular form, but one or several factors may constitute the root cause(s) of the problem under study.

A factor is considered the "root cause" of a problem if removing it prevents the problem from recurring. Conversely, a "causal factor" is a contributing action that affects an incident/event's outcome but is not the root cause. Although removing a causal factor can benefit an outcome, it does not prevent its recurrence with certainty.

A great way to look at the proactive/reactive picture is to consider the Bowtie Risk Assessment model. In the center of the model is the event or accident. To the left, are the anticipated hazards and the line of defenses put in place to prevent those hazards from causing events. The line of defense is the regulatory requirements, applicable procedures, physical barriers, and cyber barriers that are in place to manage operations and prevent events. A great way to use root cause analysis is to proactively evaluate the effectiveness of those defenses by comparing actual performance against applicable requirements, identifying performance gaps, and then closing the gaps to strengthen those defenses. If an event occurs, then we are on the right side of the model, the reactive side where the emphasis is on identifying the root causes and mitigating the damage.

Example

Imagine an investigation into a machine that stopped because it was overloaded and the fuse blew. [6] Investigation shows that the machine was overloaded because it had a bearing that was not being sufficiently lubricated. The investigation proceeds further and finds that the automatic lubrication mechanism had a pump that was not pumping sufficiently, hence the lack of lubrication. Investigation of the pump shows that it has a worn shaft. Investigation of why the shaft was worn discovers that there is not an adequate mechanism to prevent metal scrap getting into the pump. This enabled scrap to get into the pump and damage it.

The apparent root cause of the problem is that metal scrap can contaminate the lubrication system. Fixing this problem ought to prevent the whole sequence of events from recurring. The real root cause could be a design issue if there is no filter to prevent the metal scrap getting into the system. Or if it has a filter that was blocked due to a lack of routine inspection, then the real root cause is a maintenance issue.

Compare this with an investigation that does not find the root cause: replacing the fuse, the bearing, or the lubrication pump will probably allow the machine to go back into operation for a while. However there is a risk that the problem will simply recur until the root cause is dealt with.

The above does not include cost/benefit analysis: does the cost of replacing one or more machines exceed the cost of downtime until the fuse is replaced? This situation is sometimes referred to as the cure being worse than the disease. [7] [8]

As an unrelated example of the conclusions that can be drawn in the absence of the cost/benefit analysis, consider the tradeoff between some claimed benefits of population decline: In the short term there will be fewer payers into pension/retirement systems; whereas halting the population will require higher taxes to cover the cost of building more schools. This can help explain the problem of the cure being worse than the disease. [9]

Costs to consider go beyond finances when considering the personnel who operate the machinery. Ultimately, the goal is to prevent downtime; but more so prevent catastrophic injuries. Prevention begins with being proactive.

General principles

Example of a root cause analysis method Root Cause Analysis Tree Diagram.jpg
Example of a root cause analysis method

Despite the different approaches among the various schools of root cause analysis and the specifics of each application domain, RCA generally follows the same four steps:

  1. Identification and description: Effective problem statements and event descriptions (as failures, for example) are helpful and usually required to ensure the execution of appropriate root cause analyses. Problem statements are the North Star of the RCA as it keeps the team focused on what they are investigating and prevents them from going astray.
  2. Gathering, organizing and analyzing information: Most RCAs begin with a fact finding session to gather available information such as witness statements, the chronology of events and applicable requirements for the evolutions that were taking place at the time of the event. The information can be used to establish a sequence of events or timeline for the event, and to identify the line of the defenses that should have prevented the event (i.e. the administrative requirements, and physical and cyber barriers). Available databases should also be queried and analyzed (such as corrective action program and safety program databases), and data analysis tools such as Pareto charts, process maps, fault trees, and other tools that provide us with insights into performance gaps. Any number of data analysis tools can be brought to bear, including data analysis tools from Lean Six Sigma, statistical analysis tools, and others such as hierarchical clustering and data-mining solutions (such as graph-theory-based data mining). Another consists in comparing the situation under investigation with past situations stored in case libraries, using case-based reasoning tools and can include change analysis, comparative timeline analysis and task analysis.
  3. Analysis of Defenses: After identifying the defenses in place that should have prevented the event or accident, it is highly recommended to conduct an analysis of defenses (traditionally called Barrier Analysis) in every case including non-RCA investigations. One method is to list the defenses on chart or a virtual white board. Then, for each defense, look at the information and data that was gathered for evidence of the effectiveness of that defense. We are actually looking for deficiencies or gaps in performance where the administrative requirements were not met, or where the physical or cyber barriers were bypassed. These initial gaps in performance are merely symptoms of deeper-seated causes. We use these symptomatic performance gaps to develop lines of inquiry questions as outlined below, to pursue the symptoms back to their points of origin (i.e. the root causes) using cause-and-effect analysis.
  4. Generating focused, unbiased lines of inquiry questions: After gathering available information, organizing it into charts with timelines and other data, after analyzing available data, and after conducting an analysis of our defenses, we use those insights to generate great questions. These questions will become our lines of inquiry for cause-and-effect analysis. The questions must be unbiased, and to prevent any bias from the RCA team from tainting the investigation, questions should be tied to a specific defense, or to a specific insight from our data analysis (e.g., Pareto charts, process maps, fault trees, control charts) and other tools that provide us with insights into performance gaps. There should not be any curiosity questions, questions that reflect "confirmation bias" (i.e. asking a leading question so they answer what the RCA team thinks are the causes), or questions that are accusatory in nature that will cause those helping the investigation to close down and withdraw.
  5. Cause-and-Effect Analysis: Once we have developed a robust set of lines of inquiry questions from the factual evidence collected, the applicable requirements, and an analysis of the available data, we can take those questions to the organization's subject matter experts. This begins the process of cause-and-effect analysis. Once we pose a question to the affected organization, we use their answer to pose a follow-up Socratic questions. Socratic questions keep the investigation flowing down to the next deeper causal factors until the organization runs out of answers, or the last causal factor is beyond the organization's control. There are many skills involved in conducting an effective cause-and-effect analysis, including facilitation skills, communication skills, and Socratic questioning. When conducted properly, this will take the RCA down to the deepest-seated root causes. A word of caution: Ishikawa or the Fishbone Diagram, and the 5-Whys methods, are not rigorous enough for conducting a root cause analysis. The Fishbone is from the 1940s and the 5-Whys is from the 1930, and there are much more advanced methods available. Look for methods that were developed in this century (the year 2000 and later), as they are more likely to account for the new dynamics of the modern sociotechnical work environments.
  6. Charting the Results of the RCA: The best way to chart the results of an RCA investigation is to start populating the final chart from the start. This process has become much easier with the advent of virtual white boards. In a single virtual white board, we can display the timelines, the lines of defenses, the data analysis, the lines of inquiry questions, the cause-and-effect analysis, the root causes, and the corrective action plan.
  7. Corrective Actions to Prevent Recurrence: From a management perspective, the RCA effort is not complete without a comprehensive corrective action plan to address the root causes, the contributing factors, and the "Extent of the Causes." The corrective action plan should be developed by the issue owners and does not require participation by the RCA team, although the team is an excellent source of guidance for the issue owners. The Extent of Cause reviews are conducted to determine the extent of the damage or impact that the root causes and contributing factors had on humans, equipment, or facilities. Extent of Cause reviews are an Achilles heel in the vast majority of organizations and a primary reason why RCAs and corrective action plans fail to prevent recurrence. Also, care must be taken to avoid corrective action plans that simply add more administrative requirements and more training to the organization. To avoid this, use the Hierarchy of Hazard Controls and Lean Mistake Proofing as guidelines for developing effective corrective actions that have a much higher likelihood of preventing recurrence.
  8. Effectiveness Reviews: After a pre-determined period after the implementation of the corrective action plan, an effectiveness review is scheduled to evaluate the effectiveness of those corrective actions. This requires specifying a set of metrics or indicators that will be monitored prior to and after the corrective actions are implemented, so we can measure their impact. If the desired results are not achieved, which in most cases is a significant reduction in the magnitude or frequency of the event or problem, then the RCA must be reopened as it was not effective.

To be effective, root cause analysis must be performed systematically. The process enables the chance to not miss any other important details. A team effort is typically required, and ideally all persons involved should arrive at the same conclusion. In aircraft accident analyses, for example, the conclusions of the investigation and the root causes that are identified must be backed up by documented evidence. [10]

Transition to corrective actions

The goal of RCA is to identify the root cause of the problem with the intent to stop the problem from recurring or worsening. The next step is to trigger long-term corrective actions to address the root cause identified during RCA, and make sure that the problem does not resurface. Correcting a problem is not formally part of RCA, however; these are different steps in a problem-solving process known as fault management in IT and telecommunications, repair in engineering, remediation in aviation, environmental remediation in ecology, therapy in medicine, etc.

Application domains

Root cause analysis is used in many application domains. RCA is specifically called out in the United States Code of Federal Regulations in many of the Titles. For example:

  1. TITLE 10 - ENERGY >>> 10CFR Part 50, Appendix B, Criterion XVI, “Corrective Actions” (also adopted by NQA-1)
    • “Measures shall be established to assure that conditions adverse to quality such as failures, malfunctions, deficiencies, defective material and equipment, and non-conformances are promptly identified and corrected.
    • In the case of significant conditions adverse to quality, the measures shall assure that the cause of the condition is determined, and corrective action taken to prevent recurrence.”
  2. TITLE 14 - AERONAUTICS AND SPACE >>> 14 CFR Chapter III, Subchapter C, Part 437, Subpart C, §437.73   Anomaly recording, reporting and implementation of corrective actions.
    1. A permittee must record each anomaly that affects a safety-critical system, subsystem, process, facility, or support equipment.
    2. A permittee must identify all root causes of each anomaly and implement all corrective actions for each anomaly.
  3. TITLE 21 - FOOD AND DRUG >>> 21 CFR Subpart J: 21CFR820.100(a) – Corr./Preventive Action: (A) Each manufacturer shall establish and maintain procedures for implementing corrective and preventive action.  The procedures shall include requirements for:
    1. Investigating the cause of nonconformities relating to product, processes, and the quality system;
    2. Identifying the action(s) needed to correct and prevent recurrence of non- conforming product and other quality problems;
    3. Verifying or validating the corrective and preventive action to ensure that such action is effective and does not adversely affect the finished device;
  4. TITLE 42 - PUBLIC HEALTH >>> 42 CFR PART 488, SURVEY, CERTIFICATION, AND ENFORCEMENT PROCEDURES > Subpart E—Survey and Certification of Long-Term Care Facilities
    1. §488.61   Special procedures for approval and re-approval of organ transplant programs.
    2. ...Root Cause Analysis for patient deaths and graft failures, including factors the program has identified as likely causal or contributing factors for patient deaths and graft failures;

Manufacturing and industrial process control

The example above illustrates how RCA can be used in manufacturing. RCA is also routinely used in industrial process control, e.g. to control the production of chemicals (quality control).

RCA is also used for failure analysis in engineering and maintenance.

IT and telecommunications

Root cause analysis is frequently used in IT and telecommunications to detect the root causes of serious problems. For example, in the ITIL service management framework, the goal of incident management is to resume a faulty IT service as soon as possible (reactive management), whereas problem management deals with solving recurring problems for good by addressing their root causes (proactive management).

Another example is the computer security incident management process, where root-cause analysis is often used to investigate security breaches. [11]

RCA is also used in conjunction with business activity monitoring and complex event processing to analyze faults in business processes.

Its use in the IT industry cannot always be compared to its use in safety critical industries, since in normality the use of RCA in IT industry is not supported by pre-existing fault trees or other design specs. Instead a mixture of debugging, event based detection and monitoring systems (where the services are individually modelled) is normally supporting the analysis. Training and supporting tools like simulation or different in-depth runbooks for all expected scenarios do not exist, instead they are created after the fact based on issues seen as 'worthy'. As a result the analysis is often limited to those things that have monitoring/observation interfaces and not the actual planned/seen function with focus on verification of inputs and outputs. Hence, the saying "there is no root cause" has become common in the IT industry.

Health and safety

In the domains of health and safety, RCA is routinely used in medicine (diagnosis) and epidemiology (e.g., to identify the source of an infectious disease), where causal inference methods often require both clinical and statistical expertise to make sense of the complexities of the processes. [12]

RCA is used in environmental science (e.g., to analyze environmental disasters), accident analysis (aviation and rail industry), and occupational safety and health. [13] In the manufacture of medical devices, [14] pharmaceuticals, [15] food, [16] and dietary supplements, [17] root cause analysis is a regulatory requirement.

Systems analysis

RCA is also used in change management, risk management, and systems analysis.

Challenges

Without delving in the idiosyncrasies of specific problems, several general conditions can make RCA more difficult than it may appear at first sight.

First, important information is often missing because it is generally not possible, in practice, to monitor everything and store all monitoring data for a long time.

Second, gathering data and evidence, and classifying them along a timeline of events to the final problem, can be nontrivial. In telecommunications, for instance, distributed monitoring systems typically manage between a million and a billion events per day. Finding a few relevant events in such a mass of irrelevant events is asking to find the proverbial needle in a haystack.

Third, there may be more than one root cause for a given problem, and this multiplicity can make the causal graph very difficult to establish.

Fourth, causal graphs often have many levels, and root-cause analysis terminates at a level that is "root" to the eyes of the investigator. Looking again at the example above in industrial process control, a deeper investigation could reveal that the maintenance procedures at the plant included periodic inspection of the lubrication subsystem every two years, while the current lubrication subsystem vendor's product specified a 6-month period. Switching vendors may have been due to management's desire to save money, and a failure to consult with engineering staff on the implication of the change on maintenance procedures. Thus, while the "root cause" shown above may have prevented the quoted recurrence, it would not have prevented other   perhaps more severe  failures affecting other machines.

See also

Notes

  1. See Wilson, Dell & Anderson 1993 , pp. 8–17.
  2. See IATA 2016 and Sofema 2017.
  3. See Manna & Pnueli 1995.
  4. See Lewerentz & Lindner 1995.
  5. See Babaoglu et al. 2005.
  6. See Ohno 1988.
  7. "The Cure Worse Than the Disease". The New York Times . 5 November 1927.
  8. Andrew C. Revkin (7 December 2000). "Dredging River's PCB's Could Be a Cure Worse Than the disease, G. E. insists". The New York Times .
  9. Phillip Longman (9 June 2004). "The Global Baby Bust". The New York Times .
  10. See IATA 2016.
  11. See Abubakar et al. 2016
  12. Landsittel, Douglas; Srivastava, Avantika; Kropf, Kristin (2020). "A Narrative Review of Methods for Causal Inference and Associated Educational Resources". Quality Management in Health Care. 29 (4): 260–269. doi:10.1097/QMH.0000000000000276. ISSN   1063-8628. PMID   32991545. S2CID   222146291.
  13. See OSHA 2019.
  14. Office of Regulatory Affairs (26 December 2019). "Corrective and Preventive Actions (CAPA)". FDA.
  15. US-FDA. "CURRENT GOOD MANUFACTURING PRACTICE FOR FINISHED PHARMACEUTICALS". Electronic Code of Federal Regulations (eCFR). Retrieved 28 December 2020.
  16. US-FDA. "CURRENT GOOD MANUFACTURING PRACTICE, HAZARD ANALYSIS, AND RISK-BASED PREVENTIVE CONTROLS FOR HUMAN FOOD". Electronic Code of Federal Regulations (eCFR). Retrieved 28 December 2020.
  17. US-FDA. "CURRENT GOOD MANUFACTURING PRACTICE IN MANUFACTURING, PACKAGING, LABELING, OR HOLDING OPERATIONS FOR DIETARY SUPPLEMENTS". Electronic Code of Federal Regulations (eCFR). Retrieved 28 December 2020.

Related Research Articles

<span class="mw-page-title-main">Safety engineering</span> Engineering discipline which assures that engineered systems provide acceptable levels of safety

Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail.

<span class="mw-page-title-main">Ishikawa diagram</span> Causal diagrams created by Kaoru Ishikawa

Ishikawa diagrams are causal diagrams created by Kaoru Ishikawa that show the potential causes of a specific event.

<span class="mw-page-title-main">Fault tree analysis</span> Failure analysis system used in safety engineering and reliability engineering

Fault tree analysis (FTA) is a type of failure analysis in which an undesired state of a system is examined. This analysis method is mainly used in safety engineering and reliability engineering to understand how systems can fail, to identify the best ways to reduce risk and to determine event rates of a safety accident or a particular system level (functional) failure. FTA is used in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other high-hazard industries; but is also used in fields as diverse as risk factor identification relating to social service system failure. FTA is also used in software engineering for debugging purposes and is closely related to cause-elimination technique used to detect bugs.

<span class="mw-page-title-main">Failure mode and effects analysis</span> Analysis of potential system failures

Failure mode and effects analysis is the process of reviewing as many components, assemblies, and subsystems as possible to identify potential failure modes in a system and their causes and effects. For each component, the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet. There are numerous variations of such worksheets. An FMEA can be a qualitative analysis, but may be put on a quantitative basis when mathematical failure rate models are combined with a statistical failure mode ratio database. It was one of the first highly structured, systematic techniques for failure analysis. It was developed by reliability engineers in the late 1950s to study problems that might arise from malfunctions of military systems. An FMEA is often the first step of a system reliability study.

A sentinel event is "any unanticipated event in a healthcare setting that results in death or serious physical or psychological injury to a patient, not related to the natural course of the patient's illness". Sentinel events can be caused by major mistakes and negligence on the part of a healthcare provider, and are closely investigated by healthcare regulatory authorities. Sentinel events are identified under The Joint Commission (TJC) accreditation policies to help aid in root cause analysis and to assist in development of preventive measures. The Joint Commission tracks events in a database to ensure events are adequately analyzed, and that undesirable trends or decreases in performance are caught early and mitigated.

Troubleshooting is a form of problem solving, often applied to repair failed products or processes on a machine or a system. It is a logical, systematic search for the source of a problem in order to solve it, and make the product or process operational again. Troubleshooting is needed to identify the symptoms. Determining the most likely cause is a process of elimination—eliminating potential causes of a problem. Finally, troubleshooting requires confirmation that the solution restores the product or process to its working state.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability is defined as the probability that a product, system, or service will perform its intended function adequately for a specified period of time, OR will operate in a defined environment without failure. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

Futures techniques used in the multi-disciplinary field of futurology by futurists in Americas and Australasia, and futurology by futurologists in EU, include a diverse range of forecasting methods, including anticipatory thinking, backcasting, simulation, and visioning. Some of the anticipatory methods include, the delphi method, causal layered analysis, environmental scanning, morphological analysis, and scenario planning.

<span class="mw-page-title-main">Predictive maintenance</span> Method to predict when equipment should be maintained

Predictive maintenance techniques are designed to help determine the condition of in-service equipment in order to estimate when maintenance should be performed. This approach claims more cost savings over routine or time-based preventive maintenance, because tasks are performed only when warranted. Thus, it is regarded as condition-based maintenance carried out as suggested by estimations of the degradation state of an item.

A failure reporting, analysis, and corrective action system (FRACAS) is a system, sometimes carried out using software, that provides a process for reporting, classifying, analyzing failures, and planning corrective actions in response to those failures. It is typically used in an industrial environment to collect data, record and analyze system failures. A FRACAS system may attempt to manage multiple failure reports and produces a history of failure and corrective actions. FRACAS records the problems related to a product or process and their associated root causes and failure analyses to assist in identifying and implementing corrective actions.

Eight Disciplines Methodology (8D) is a method or model developed at Ford Motor Company used to approach and to resolve problems, typically employed by quality engineers or other professionals. Focused on product and process improvement, its purpose is to identify, correct, and eliminate recurring problems. It establishes a permanent corrective action based on statistical analysis of the problem and on the origin of the problem by determining the root causes. Although it originally comprised eight stages, or 'disciplines', it was later augmented by an initial planning stage. 8D follows the logic of the PDCA cycle. The disciplines are:

<span class="mw-page-title-main">Accident analysis</span> Process to determine the causes of accidents to prevent recurrence

Accident analysis is a process carried out in order to determine the cause or causes of an accident so as to prevent further accidents of a similar kind. It is part of accident investigation or incident investigation. These analyses may be performed by a range of experts, including forensic scientists, forensic engineers or health and safety advisers. Accident investigators, particularly those in the aircraft industry, are colloquially known as "tin-kickers". Health and safety and patient safety professionals prefer using the term "incident" in place of the term "accident". Its retrospective nature means that accident analysis is primarily an exercise of directed explanation; conducted using the theories or methods the analyst has to hand, which directs the way in which the events, aspects, or features of accident phenomena are highlighted and explained. These analyses are also invaluable in determining ways to prevent future incidents from occurring. They provide good insight by determining root causes, into what failures occurred that lead to the incident.

A preventive action is a change implemented to address a weakness in a management system that is not yet responsible for causing nonconforming product or service.

Corrective and preventive action consists of improvements to an organization's processes taken to eliminate causes of non-conformities or other undesirable situations. It is usually a set of actions, laws or regulations required by an organization to take in manufacturing, documentation, procedures, or systems to rectify and eliminate recurring non-conformance. Non-conformance is identified after systematic evaluation and analysis of the root cause of the non-conformance. Non-conformance may be a market complaint or customer complaint or failure of machinery or a quality management system, or misinterpretation of written instructions to carry out work. The corrective and preventive action is designed by a team that includes quality assurance personnel and personnel involved in the actual observation point of non-conformance. It must be systematically implemented and observed for its ability to eliminate further recurrence of such non-conformation. The Eight disciplines problem solving method, or 8D framework, can be used as an effective method of structuring a CAPA.

Event correlation is a technique for making sense of a large number of events and pinpointing the few events that are really important in that mass of information. This is accomplished by looking for and analyzing relationships between events.

The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application of remedial controls using a systems-based approach. This is different from traditional safety strategies which rely on control of conditions and causes of an accident based either on the epidemiological analysis or as a result of investigation of individual past accidents. The concept of system safety is useful in demonstrating adequacy of technologies when difficulties are faced with probabilistic risk analysis. The underlying principle is one of synergy: a whole is more than sum of its parts. Systems-based approach to safety requires the application of scientific, technical and managerial skills to hazard identification, hazard analysis, and elimination, control, or management of hazards throughout the life-cycle of a system, program, project or an activity or a product. "Hazop" is one of several techniques available for identification of hazards.

Human factors are the physical or cognitive properties of individuals, or social behavior which is specific to humans, and which influence functioning of technological systems as well as human-environment equilibria. The safety of underwater diving operations can be improved by reducing the frequency of human error and the consequences when it does occur. Human error can be defined as an individual's deviation from acceptable or desirable practice which culminates in undesirable or unexpected results. Human factors include both the non-technical skills that enhance safety and the non-technical factors that contribute to undesirable incidents that put the diver at risk.

[Safety is] An active, adaptive process which involves making sense of the task in the context of the environment to successfully achieve explicit and implied goals, with the expectation that no harm or damage will occur. – G. Lock, 2022

Dive safety is primarily a function of four factors: the environment, equipment, individual diver performance and dive team performance. The water is a harsh and alien environment which can impose severe physical and psychological stress on a diver. The remaining factors must be controlled and coordinated so the diver can overcome the stresses imposed by the underwater environment and work safely. Diving equipment is crucial because it provides life support to the diver, but the majority of dive accidents are caused by individual diver panic and an associated degradation of the individual diver's performance. – M.A. Blumenberg, 1996

Event tree analysis (ETA) is a forward, top-down, logical modeling technique for both success and failure that explores responses through a single initiating event and lays a path for assessing probabilities of the outcomes and overall system analysis. This analysis technique is used to analyze the effects of functioning or failed systems given that an event has occurred.

Root Cause Analysis Solver Engine is a proprietary algorithm developed from research originally at the Warwick Manufacturing Group (WMG) at Warwick University. RCASE development commenced in 2003 to provide an automated version of root cause analysis, the method of problem solving that tries to identify the root causes of faults or problems. RCASE is now owned by the spin-out company Warwick Analytics where it is being applied to automated predictive analytics software.

Tripod Beta is an incident and accident analysis methodology made available by the Stichting Tripod Foundation via the Energy Institute. The methodology is designed to help an accident investigator analyse the causes of an incident or accident in conjunction with conducting the investigation. This helps direct the investigation as the investigator will be able to see where more information is needed about what happened, or how or why the incident occurred.

References