Failure modes, effects, and diagnostic analysis

Last updated

Failure modes, effects, and diagnostic analysis (FMEDA) is a systematic analysis technique to obtain subsystem / product level failure rates, failure modes and diagnostic capability. The FMEDA technique considers:

Contents

Given a component database calibrated with field failure data that is reasonably accurate, [1] the method can predict product level failure rate and failure mode data for a given application. The predictions have been shown to be more accurate [2] than field warranty return analysis or even typical field failure analysis given that these methods depend on reports that typically do not have sufficient detail information in failure records. [3]

The abstract of an FMEDA report typically mentions the Safe Failure Fraction (rate of failures that are neither dangerous nor undetected over the total rate) and the Diagnostic Coverage (rate of detected dangerous failures over the rate of all dangerous failures). Each term is defined equivalently in both standards, IEC 61508 and ISO 13849.

The name was given by Dr. William M. Goble in 1994 to the technique that had been in development since 1988 by Dr. Goble and other engineers now at exida. [4]

Antecedents

A failure modes and effects analysis, FMEA, is a structured qualitative analysis of a system, subsystem, process, design or function to identify potential failure modes, their causes and their effects on (system) operation. The concept and practice of performing a FMEA, has been around in some form since the 1960s. The practice was first formalized in the 1970s with the development of US MIL-STD-1629/1629A. In early practice its use was limited to select applications and industries where cost of failure was particularly high. The primary benefits were to qualitatively evaluate the safety and reliability of a system, determine unacceptable failure modes, identify potential design improvements, plan maintenance activities and help understand system operation in the presence of potential faults. The failure modes, effects and criticality analysis (FMECA) was introduced to address a primary barrier to effective use of the detailed FMEA results by the addition of a criticality metric. This allowed users of the analysis to quickly focus on the most important failure modes/effects in terms of risk. This allowed prioritization to drive improvements based on cost / benefit comparisons.

Development

The FMEDA technique was developed in the late 1980s by exida engineers based in part on a paper in the 1984 RAMS Symposium. [5] The initial FMEDA added two additional pieces of information to the FMEA analysis process. The first piece of information added in an FMEDA is the quantitative failure data (failure rates and the distribution of failure modes) for all components being analyzed. The second piece of information added to an FMEDA is the probability of the system or subsystem to detect internal failures via automatic on-line diagnostics. This is crucial to achieving and maintaining reliability in increasingly complex systems and for systems that may not be fully exercising all functionality under normal circumstances such as a low demand emergency shutdown system. There is a clear need for a measurement of automatic diagnostic capability. This was recognized in the late 1980s [6] In that context the principles and basic methods for the modern FMEDA were first documented in the book Evaluating Control System Reliability. [7] The actual term FMEDA was first used in 1994 [8] and after further refinement the methods were published in the late 1990s. [9] [10] [11] The method was explained to members of the IEC 61508 committee in the late 90s and included in the standard as a method of determining failure rate, failure mode and diagnostic coverage for products. FMEDA techniques have been further refined during the 2000s primarily during IEC 61508 preparation work. The key changes have been: 1. Use of Functional Failure Modes; 2. Mechanical Component Usage; 3. Prediction of manual proof test effectiveness; and 4. Prediction of product useful life. With these changes, the FMEDA technique has matured to become more complete and useful.

Functional failure mode analysis

Also in the early 2000s functional failure mode analysis was added to the FMEDA process by John C. Grebe. In early FMEDA work, component failure modes were mapped directly to "safe" or "dangerous" categories per IEC 61508. This was relatively easy since everything that was not "dangerous" was "safe." With multiple failure mode categories now existing, direct assignment became more difficult. In addition, it became clear that the category assignment might change if a product were used in different applications. With direct failure mode category assignment during the FMEDA, a new FMEDA was required for each new application or each variation in usage. Under the functional failure mode approach, the actual functional failure modes of the product are identified during an FMEA. During the detailed FMEDA, each component failure mode is mapped to a functional failure mode. The functional failure modes are then categorized according to product failure mode in a particular application. This eliminates the need for more detailed work when a new application is considered.

Mechanical FMEDA Techniques

It became clear in the early 2000s that many products being used in safety critical applications had mechanical components. An FMEDA done without considering these mechanical components was incomplete, misleading, and potentially dangerous. The fundamental problem in using the FMEDA technique was the lack of a mechanical component database that included part failure rates and failure mode distributions. Using a number of published reference sources, exida began development of a mechanical component database in 2003. [12] After a few years of research and refinement, [13] the database has been published. [14] This has allowed the FMEDA to be used on combination electrical / mechanical components and purely mechanical components.

Manual Proof Test Effectiveness

The FMEDA can predict the effectiveness of any defined manual proof test in the same way it can predict automatic diagnostic coverage. An additional column is added to the FMEDA and probability of detection for each component failure mode is estimated. The cumulative effectiveness of the proof test is calculated in the same way as automatic diagnostic coverage.

Product Useful Life

As each component within a product is reviewed, those with a relatively short useful life span are identified. One example of this is an electrolytic capacitor. Many designs have a useful life limitation of 10 years. Since constant failure rates are only valid during the useful life period, this metric is valuable for interpreting FMEDA result limitations.

The Future

FMEDA Comparison Studies FMEDA Comparison Studies.png
FMEDA Comparison Studies

Further refinement of the component database with selective calibration to different operation profiles is needed. In addition, comparisons of FMEDA results with field failure studies, have shown that human factors, especially maintenance procedures, affect the failure rates and failure modes of products.

As more data becomes available, the component database can be refined and updated. After a few years of research and refinement, [15] the database has been published [16] as required by new technology and new knowledge. The success of the FMEDA technique is supplying needed data in a relatively accurate way has allowed the probabilistic, performance approach to design to work.

See also

Related Research Articles

<span class="mw-page-title-main">Safety engineering</span> Engineering discipline which assures that engineered systems provide acceptable levels of safety

Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail.

<span class="mw-page-title-main">Fault tree analysis</span> Failure analysis system used in safety engineering and reliability engineering

Fault tree analysis (FTA) is a type of failure analysis in which an undesired state of a system is examined. This analysis method is mainly used in safety engineering and reliability engineering to understand how systems can fail, to identify the best ways to reduce risk and to determine event rates of a safety accident or a particular system level (functional) failure. FTA is used in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other high-hazard industries; but is also used in fields as diverse as risk factor identification relating to social service system failure. FTA is also used in software engineering for debugging purposes and is closely related to cause-elimination technique used to detect bugs.

<span class="mw-page-title-main">Safety-critical system</span> System whose failure would be serious

A safety-critical system or life-critical system is a system whose failure or malfunction may result in one of the following outcomes:

Failure mode and effects analysis is the process of reviewing as many components, assemblies, and subsystems as possible to identify potential failure modes in a system and their causes and effects. For each component, the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet. There are numerous variations of such worksheets. An FMEA can be a qualitative analysis, but may be put on a quantitative basis when mathematical failure rate models are combined with a statistical failure mode ratio database. It was one of the first highly structured, systematic techniques for failure analysis. It was developed by reliability engineers in the late 1950s to study problems that might arise from malfunctions of military systems. An FMEA is often the first step of a system reliability study.

Failure causes are defects in design, process, quality, or part application, which are the underlying cause of a failure or which initiate a process which leads to failure. Where failure depends on the user of the product or process, then human error must be considered.

Failure rate is the frequency with which an engineered system or component fails, expressed in failures per unit of time. It is usually denoted by the Greek letter λ (lambda) and is often used in reliability engineering.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

In functional safety, safety integrity level (SIL) is defined as the relative level of risk-reduction provided by a safety instrumented function (SIF), i.e. the measurement of the performance required of the SIF.

<span class="mw-page-title-main">ARP4761</span>

ARP4761, Guidelines for Conducting the Safety Assessment Process on Civil Aircraft, Systems, and Equipment is an Aerospace Recommended Practice from SAE International. In conjunction with ARP4754, ARP4761 is used to demonstrate compliance with 14 CFR 25.1309 in the U.S. Federal Aviation Administration (FAA) airworthiness regulations for transport category aircraft, and also harmonized international airworthiness regulations such as European Aviation Safety Agency (EASA) CS–25.1309.

Failure mode effects and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA).

IEC 61508 is an international standard published by the International Electrotechnical Commission (IEC) consisting of methods on how to apply, design, deploy and maintain automatic protection systems called safety-related systems. It is titled Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems.

Engineering analysis involves the application of scientific/mathematical analytic principles and processes to reveal the properties and state of a system, device or mechanism under study.

Spurious trip level (STL) is defined as a discrete level for specifying the spurious trip requirements of safety functions to be allocated to safety systems. An STL of 1 means that this safety function has the highest level of spurious trips. The higher the STL level the lower the number of spurious trips caused by the safety system. There is no limit to the number of spurious trip levels.

Functional safety is the part of the overall safety of a system or piece of equipment that depends on automatic protection operating correctly in response to its inputs or failure in a predictable manner (fail-safe). The automatic protection system should be designed to properly handle likely systematic errors, hardware failures and operational/environmental stress.

ISO 26262, titled "Road vehicles – Functional safety", is an international standard for functional safety of electrical and/or electronic systems that are installed in serial production road vehicles, defined by the International Organization for Standardization (ISO) in 2011, and revised in 2018.

Partial stroke testing is a technique used in a control system to allow the user to test a percentage of the possible failure modes of a shut down valve without the need to physically close the valve. PST is used to assist in determining that the safety function will operate on demand. PST is most often used on high integrity emergency shutdown valves (ESDVs) in applications where closing the valve will have a high cost burden yet proving the integrity of the valve is essential to maintaining a safe facility. In addition to ESDVs PST is also used on high integrity pressure protection systems or HIPPS. Partial stroke testing is not a replacement for the need to fully stroke valves as proof testing is still a mandatory requirement.

<span class="mw-page-title-main">Parasoft C/C++test</span> Integrated set of tools

Parasoft C/C++test is an integrated set of tools for testing C and C++ source code that software developers use to analyze, test, find defects, and measure the quality and security of their applications. It supports software development practices that are part of development testing, including static code analysis, dynamic code analysis, unit test case generation and execution, code coverage analysis, regression testing, runtime error detection, requirements traceability, and code review. It's a commercial tool that supports operation on Linux, Windows, and Solaris platforms as well as support for on-target embedded testing and cross compilers.

ISO 13849 is a safety standard which applies to parts of machinery control systems that are assigned to providing safety functions. The standard is one of a group of sector-specific functional safety standards that were created to tailor the generic system reliability approaches, e.g., IEC 61508, MIL-HDBK-217, MIL-HDBK-338, to the needs of a particular sector. ISO 13849 is simplified for use in the machinery sector.

Automotive Safety Integrity Level (ASIL) is a risk classification scheme defined by the ISO 26262 - Functional Safety for Road Vehicles standard. This is an adaptation of the Safety Integrity Level (SIL) used in IEC 61508 for the automotive industry. This classification helps defining the safety requirements necessary to be in line with the ISO 26262 standard. The ASIL is established by performing a risk analysis of a potential hazard by looking at the Severity, Exposure and Controllability of the vehicle operating scenario. The safety goal for that hazard in turn carries the ASIL requirements.

References

  1. Electrical & Mechanical Component Reliability Handbook. exida. 2006.
  2. Goble, William M.; Iwan van Beurden (2014). Combining field failure data with new instrument design margins to predict failure rates for SIS Verification (PDF) (Report). Proceedings of the 2014 International Symposium - BEYOND REGULATORY COMPLIANCE, MAKING SAFETY SECOND NATURE, Hilton College Station-Conference Center, College Station, Texas.
  3. Goble, W. M. Field Failure Data – the Good, the Bad and the Ugly (Report). Sellersville, PA: exida.
  4. "Dr. William Goble - CFSE - USA". exida.
  5. Collett, R.E.; Bachant, P.W. (1984). "Integration of BIT Effectiveness with FMECA". Annual Reliability and Maintainability Symposium, 1984. Proceedings. IEEE. pp. 300–305. doi:10.1109/RAMS.1984.764308.
  6. Amer, H. A.; McCluskey, E. J. (1987). Weighted Coverage in Fault-Tolerant Systems. IEEE. pp. 187–191.
  7. Goble, William M. (1992). Evaluating Control Systems Reliability, Techniques and Applications. ISA.
  8. FMEDA Analysis of CDM (Critical Discrete Module) – QUADLOG. Moore Products Company. 1994.
  9. Goble, W.M. (1998). The Use and Development of Quantitative Reliability and Safety Analysis in New Product Design. University Press, Eindhoven University of Technology, Netherlands.
  10. Goble, W.M. (1998). Control Systems Safety Evaluation and Reliability. 2. ISA.
  11. Goble, W.M.; A. C. Brombacher (1999). Using a Failure Modes, Effects and Diagnostic Analysis (FMEDA) to Measure Diagnostic Coverage in Programmable Electronic Systems. Reliability Engineering and System Safety, Vol. 66, No. 2.
  12. Goble, William M. (2003). Accurate Failure Metrics for Mechanical Instruments. Proceedings of IEC 61508 Conference, Germany: Augsberg, RWTUV.
  13. Goble, William M.; J.V. Bukowski (2007). Development of a Mechanical Component Failure Database. 2007 Proceedings of the Annual Reliability and Maintainability Symposium NY: NY, IEEE.
  14. Electrical & Mechanical Component Reliability Handbook. exida. 2006.
  15. Goble, William M.; J.V. Bukowski (2007). Development of a Mechanical Component Failure Database. 2007 Proceedings of the Annual Reliability and Maintainability Symposium NY: NY, IEEE.
  16. Electrical & Mechanical Component Reliability Handbook, Third Edition. exida. 2008. ISBN   978-1-934977-04-0.