Failure modes, effects, and diagnostic analysis

Last updated

Failure modes, effects, and diagnostic analysis (FMEDA) is a systematic analysis technique to obtain subsystem / device level failure rates, failure modes and diagnostic capability. The FMEDA technique considers:

Contents

Given a component database calibrated with field failure data that is reasonably accurate, [1] the method can predict device level failure rate per failure mode, useful life, automatic diagnostic effectiveness, and latent fault test effectiveness for a given application. The predictions have been shown to be more accurate [2] than field warranty return analysis or even typical field failure analysis given that these methods depend on reports that typically do not have sufficient detail information in failure records. [3]

An FMEDA can predict failure rates per defined failure modes. For Functional Safety applications the IEC 61508 failure modes (safe, dangerous, annunciation, and no effect) are used. These failure rate numbers can be converted into the alternative failure modes from the automotive functional safety standard, ISO 26262.

The FMEDA name was given by Dr. William M. Goble in 1994 to the technique that had been in development since 1988 by Dr. Goble and other engineers now at exida. [4]

Antecedents

A design failure modes and effects analysis, DFMEA, is a structured qualitative analysis of a system, subsystem, device design to identify potential failure modes and their effects on correct operation. The concept and practice of performing a DFMEA, has been around in some form since the 1960s. The practice was first formalized in the 1970s with the development of US MIL-STD-1629/1629A.

A variation of DFMEA developed for functional safety applications is called Design Deviation and Mitigation Analysis (DDMA). [5] The DDMA variation adds information not normally included in a DFMEA such as the automatic diagnostic mitigations, latent fault tests, and useful life. DDMA deletes RPN numbers as they are replaced by FMEDA results.

Development

The FMEDA technique was developed in the late 1980s by exida engineers based in part on a paper in the 1984 RAMS Symposium. [6] The initial FMEDA added additional information to the FMEA process. The first piece of information added in an FMEDA is the quantitative failure data (failure rates and the distribution of failure modes) for all components being analyzed. The second piece of information added to an FMEDA is the probability of the system or subsystem to detect internal failures via automatic on-line diagnostics. The need to measure automatic diagnostic effectiveness was recognized in the late 1980s. [7] Functional safety failure modes were added and first documented in the book Evaluating Control System Reliability. [8] The actual term FMEDA was first used in 1994 [9] and after further refinement the methods were published in the late 1990s. [10] [11] [12] The method was explained to members of the IEC 61508 committee in the late 90s and included in the standard as a method of determining failure rate, failure mode and diagnostic coverage for devices. FMEDA techniques have been further refined during the 2000s primarily during IEC 61508 preparation work. The key changes have been: 1. Use of Functional Failure Modes; 2. Mechanical Component Usage; 3. Prediction of latent fault test effectiveness; and 4. Prediction of product useful life.

Functional failure mode analysis

In the early 2000s functional failure mode analysis was added to the FMEDA process by John C. Grebe. In early FMEDA work, component failure modes were mapped directly to "safe" or "dangerous" categories per IEC 61508, 1st Edition. This was relatively easy since everything that was not "dangerous" was "safe." With multiple failure mode categories now existing, direct assignment became more difficult. In addition, it became clear that the category assignment might change if a product were used in different applications. With direct failure mode category assignment during the FMEDA, a new FMEDA was required for each new application or each variation in usage. Under the functional failure mode approach, the actual functional failure modes of the product are identified during a DFMEA. During the detailed FMEDA, each component failure mode is mapped to a functional failure mode. The functional failure modes are then categorized according to product failure mode in a particular application. [13]

Mechanical FMEDA Techniques

It became clear in the early 2000s that many products being used in safety critical applications had mechanical components. An FMEDA done without considering these mechanical components was incomplete, misleading, and potentially dangerous. The fundamental problem in using the FMEDA technique was the lack of a mechanical component database that included part failure rates and failure mode distributions. Using a number of published reference sources, exida began development of a mechanical component database in 2003. [14] After a few years of research and refinement, [15] the database has been published. [16] This has allowed the FMEDA to be used on combination electrical / mechanical components and purely mechanical components.

Latent Fault Test Effectiveness

The FMEDA can predict the effectiveness of any defined latent fault test in the same way it can predict automatic diagnostic coverage. An additional column may be added to an FMEDA spreadsheet and probability of detection for each component failure mode is estimated. The cumulative effectiveness of the proof test is calculated in the same way as automatic diagnostic coverage. FMEDA tools can also calculate latent fault effectiveness.

Device Useful Life

As each component within a product is reviewed, those with a relatively short useful life span are identified. One example of this is an electrolytic capacitor. Many designs have a useful life limitation of 10 years. Since constant failure rates are only valid during the useful life period, this metric is valuable for interpreting FMEDA result limitations.

The Future

FMEDA Comparison Studies FMEDA Comparison Studies.png
FMEDA Comparison Studies

Further refinement of the component database with selective calibration to different operation profiles is needed. In addition, comparisons of FMEDA results with field failure studies, have shown that human factors, especially maintenance procedures, affect the failure rates and failure modes of products.

As more data becomes available, the component database can be refined and updated. After a few years of research and refinement, [17] the database has been published [18] as required by new technology and new knowledge. The success of the FMEDA technique is supplying needed data in a relatively accurate way has allowed the probabilistic, performance approach to design to work.

See also

Related Research Articles

<span class="mw-page-title-main">Safety engineering</span> Engineering discipline which assures that engineered systems provide acceptable levels of safety

Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail.

<span class="mw-page-title-main">Fault tree analysis</span> Failure analysis system used in safety engineering and reliability engineering

Fault tree analysis (FTA) is a type of failure analysis in which an undesired state of a system is examined. This analysis method is mainly used in safety engineering and reliability engineering to understand how systems can fail, to identify the best ways to reduce risk and to determine event rates of a safety accident or a particular system level (functional) failure. FTA is used in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other high-hazard industries; but is also used in fields as diverse as risk factor identification relating to social service system failure. FTA is also used in software engineering for debugging purposes and is closely related to cause-elimination technique used to detect bugs.

<span class="mw-page-title-main">Safety-critical system</span> System whose failure would be serious

A safety-critical system or life-critical system is a system whose failure or malfunction may result in one of the following outcomes:

<span class="mw-page-title-main">Failure mode and effects analysis</span> Analysis of potential system failures

Failure mode and effects analysis is the process of reviewing as many components, assemblies, and subsystems as possible to identify potential failure modes in a system and their causes and effects. For each component, the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet. There are numerous variations of such worksheets. An FMEA can be a qualitative analysis, but may be put on a quantitative basis when mathematical failure rate models are combined with a statistical failure mode ratio database. It was one of the first highly structured, systematic techniques for failure analysis. It was developed by reliability engineers in the late 1950s to study problems that might arise from malfunctions of military systems. An FMEA is often the first step of a system reliability study.

Failure rate is the frequency with which an engineered system or component fails, expressed in failures per unit of time. It is usually denoted by the Greek letter λ (lambda) and is often used in reliability engineering.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability is defined as the probability that a product, system, or service will perform its intended function adequately for a specified period of time, OR will operate in a defined environment without failure. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

In functional safety, safety integrity level (SIL) is defined as the relative level of risk-reduction provided by a safety instrumented function (SIF), i.e. the measurement of the performance required of the SIF.

<span class="mw-page-title-main">ARP4761</span> Aerospace recommended practice from SAE International

ARP4761, Guidelines for Conducting the Safety Assessment Process on Civil Aircraft, Systems, and Equipment is an Aerospace Recommended Practice from SAE International. In conjunction with ARP4754, ARP4761 is used to demonstrate compliance with 14 CFR 25.1309 in the U.S. Federal Aviation Administration (FAA) airworthiness regulations for transport category aircraft, and also harmonized international airworthiness regulations such as European Aviation Safety Agency (EASA) CS–25.1309.

Failure mode effects and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA).

IEC 61508 is an international standard published by the International Electrotechnical Commission (IEC) consisting of methods on how to apply, design, deploy and maintain automatic protection systems called safety-related systems. It is titled Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems.

Software safety is an engineering discipline that aims to ensure that software, which is used in safety-related systems, does not contribute to any hazards such a system might pose. There are numerous standards that govern the way how safety-related software should be developed and assured in various domains. Most of them classify software according to their criticality and propose techniques and measures that should be employed during the development and assurance:

A high-integrity pressure protection system (HIPPS) is a type of safety instrumented system (SIS) designed to prevent over-pressurization of a plant, such as a chemical plant or oil refinery. The HIPPS will shut off the source of the high pressure before the design pressure of the system is exceeded, thus preventing loss of containment through rupture (explosion) of a line or vessel. Therefore, a HIPPS is considered as a barrier between a high-pressure and a low-pressure section of an installation.

Spurious trip level (STL) is defined as a discrete level for specifying the spurious trip requirements of safety functions to be allocated to safety systems. An STL of 1 means that this safety function has the highest level of spurious trips. The higher the STL level the lower the number of spurious trips caused by the safety system. There is no limit to the number of spurious trip levels.

Functional safety is the part of the overall safety of a system or piece of equipment that depends on automatic protection operating correctly in response to its inputs or failure in a predictable manner (fail-safe). The automatic protection system should be designed to properly handle likely systematic errors, hardware failures and operational/environmental stress.

ISO 26262, titled "Road vehicles – Functional safety", is an international standard for functional safety of electrical and/or electronic systems that are installed in serial production road vehicles, defined by the International Organization for Standardization (ISO) in 2011, and revised in 2018.

Partial stroke testing is a technique used in a control system to allow the user to test a percentage of the possible failure modes of a shut down valve without the need to physically close the valve. PST is used to assist in determining that the safety function will operate on demand. PST is most often used on high integrity emergency shutdown valves (ESDVs) in applications where closing the valve will have a high cost burden yet proving the integrity of the valve is essential to maintaining a safe facility. In addition to ESDVs PST is also used on high integrity pressure protection systems or HIPPS. Partial stroke testing is not a replacement for the need to fully stroke valves as proof testing is still a mandatory requirement.

Pad cratering is a mechanically induced fracture in the resin between copper foil and outermost layer of fiberglass of a printed circuit board (PCB). It may be within the resin or at the resin to fiberglass interface.

ISO 13849 is a safety standard which applies to parts of machinery control systems that are assigned to providing safety functions. The standard is one of a group of sector-specific functional safety standards that were created to tailor the generic system reliability approaches, e.g., IEC 61508, MIL-HDBK-217, MIL-HDBK-338, to the needs of a particular sector. ISO 13849 is simplified for use in the machinery sector.

Automotive Safety Integrity Level (ASIL) is a risk classification scheme defined by the ISO 26262 - Functional Safety for Road Vehicles standard. This is an adaptation of the Safety Integrity Level (SIL) used in IEC 61508 for the automotive industry. This classification helps defining the safety requirements necessary to be in line with the ISO 26262 standard. The ASIL is established by performing a risk analysis of a potential hazard by looking at the Severity, Exposure and Controllability of the vehicle operating scenario. The safety goal for that hazard in turn carries the ASIL requirements.

References

  1. Component Reliability Database (CRD) Handbook, Sixth Edition. exida. 2023.
  2. Goble, William M.; Iwan van Beurden (2014). Combining field failure data with new instrument design margins to predict failure rates for SIS Verification (PDF) (Report). Proceedings of the 2014 International Symposium - BEYOND REGULATORY COMPLIANCE, MAKING SAFETY SECOND NATURE, Hilton College Station-Conference Center, College Station, Texas.
  3. Goble, W. M. Field Failure Data – the Good, the Bad and the Ugly (Report). Sellersville, PA: exida.
  4. "Dr. William Goble - CFSE - USA". exida.
  5. Goble, William M. (2024). The Essential DFMEA Process – Maximum Value / Optimal Cost (Report). exida.
  6. Collett, R.E.; Bachant, P.W. (1984). "Integration of BIT Effectiveness with FMECA". Annual Reliability and Maintainability Symposium, 1984. Proceedings. IEEE. pp. 300–305. doi:10.1109/RAMS.1984.764308.
  7. Amer, H. A.; McCluskey, E. J. (1987). Weighted Coverage in Fault-Tolerant Systems. IEEE. pp. 187–191.
  8. Goble, William M. (1992). Evaluating Control Systems Reliability, Techniques and Applications. ISA.
  9. FMEDA Analysis of CDM (Critical Discrete Module) – QUADLOG. Moore Products Company. 1994.
  10. Goble, W.M. (1998). The Use and Development of Quantitative Reliability and Safety Analysis in New Product Design. University Press, Eindhoven University of Technology, Netherlands.
  11. Goble, W.M. (1998). Control Systems Safety Evaluation and Reliability. 2. ISA.
  12. Goble, W.M.; A. C. Brombacher (1999). Using a Failure Modes, Effects and Diagnostic Analysis (FMEDA) to Measure Diagnostic Coverage in Programmable Electronic Systems. Reliability Engineering and System Safety, Vol. 66, No. 2.
  13. Chalupa, Rudy P. (2024). Get Your FMEDA Done Faster – Use Functional Effects. exida.
  14. Goble, William M. (2003). Accurate Failure Metrics for Mechanical Instruments. Proceedings of IEC 61508 Conference, Germany: Augsberg, RWTUV.
  15. Goble, William M.; J.V. Bukowski (2007). Development of a Mechanical Component Failure Database. 2007 Proceedings of the Annual Reliability and Maintainability Symposium NY: NY, IEEE.
  16. Electrical & Mechanical Component Reliability Handbook. exida. 2006.
  17. Goble, William M.; J.V. Bukowski (2007). Development of a Mechanical Component Failure Database. 2007 Proceedings of the Annual Reliability and Maintainability Symposium NY: NY, IEEE.
  18. Component Reliability Database (CRD) Handbook, Sixth Edition. exida. 2023. ISBN   978-1-934977-04-0.