Failure mode, effects, and criticality analysis

Last updated

Failure mode effects and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA).

Contents

FMEA is a bottom-up, inductive analytical method which may be performed at either the functional or piece-part level. FMECA extends FMEA by including a criticality analysis, which is used to chart the probability of failure modes against the severity of their consequences. The result highlights failure modes with relatively high probability and severity of consequences, allowing remedial effort to be directed where it will produce the greatest value. FMECA tends to be preferred over FMEA in space and NATO military applications, while various forms of FMEA predominate in other industries.

History

FMECA was originally developed in the 1940s by the U.S military, which published MILP1629 in 1949. [1] By the early 1960s, contractors for the U.S. National Aeronautics and Space Administration (NASA) were using variations of FMECA under a variety of names. [2] [3] In 1966 NASA released its FMECA procedure for use on the Apollo program. [4] FMECA was subsequently used on other NASA programs including Viking, Voyager, Magellan, and Galileo. [5] Possibly because MILP1629 was replaced by MILSTD1629 (SHIPS) in 1974, development of FMECA is sometimes incorrectly attributed to NASA. [6] At the same time as the space program developments, use of FMEA and FMECA was already spreading to civil aviation. In 1967 the Society for Automotive Engineers released the first civil publication to address FMECA. [7] The civil aviation industry now tends to use a combination of FMEA and Fault Tree Analysis in accordance with SAE ARP4761 instead of FMECA, though some helicopter manufacturers continue to use FMECA for civil rotorcraft.

Ford Motor Company began using FMEA in the 1970s after problems experienced with its Pinto model, and by the 1980s FMEA was gaining broad use in the automotive industry. In Europe, the International Electrotechnical Commission published IEC 812 (now IEC 60812) in 1985, addressing both FMEA and FMECA for general use. [8] The British Standards Institute published BS 57605 in 1991 for the same purpose. [9]

In 1980, MILSTD1629A replaced both MILSTD1629 and the 1977 aeronautical FMECA standard MILSTD2070. [10] MILSTD1629A was canceled without replacement in 1998, but nonetheless remains in wide use for military and space applications today. [11]

Methodology

Slight differences are found between the various FMECA standards. By RAC CRTAFMECA, the FMECA analysis procedure typically consists of the following logical steps:

FMECA may be performed at the functional or piece-part level. Functional FMECA considers the effects of failure at the functional block level, such as a power supply or an amplifier. Piece-part FMECA considers the effects of individual component failures, such as resistors, transistors, microcircuits, or valves. A piece-part FMECA requires far more effort, but provides the benefit of better estimates of probabilities of occurrence. However, Functional FMEAs can be performed much earlier, may help to better structure the complete risk assessment and provide other type of insight in mitigation options. The analyses are complementary.

The criticality analysis may be quantitative or qualitative, depending on the availability of supporting part failure data.

System definition

In this step, the major system to be analyzed is defined and partitioned into an indented hierarchy such as systems, subsystems or equipment, units or subassemblies, and piece-parts. Functional descriptions are created for the systems and allocated to the subsystems, covering all operational modes and mission phases.

Ground rules and assumptions

Before detailed analysis takes place, ground rules and assumptions are usually defined and agreed to. This might include, for example:

Block diagrams

Next, the systems and subsystems are depicted in functional block diagrams. Reliability block diagrams or fault trees are usually constructed at the same time. These diagrams are used to trace information flow at different levels of system hierarchy, identify critical paths and interfaces, and identify the higher level effects of lower level failures.

Failure mode identification

For each piece-part or each function covered by the analysis, a complete list of failure modes is developed. For functional FMECA, typical failure modes include:

For piece-part FMECA, failure mode data may be obtained from databases such as RAC FMD91 [12] or RAC FMD97. [13] These databases provide not only the failure modes, but also the failure mode ratios. For example:

Device Failure Modes and Failure Mode Ratios (FMD91)
Device TypeFailure ModeRatio (α)
RelayFails to trip.55
Spurious trip.26
Short.19
Resistor, CompositionParameter change.66
Open.31
Short.03

Each function or piece-part is then listed in matrix form with one row for each failure mode. Because FMECA usually involves very large data sets, a unique identifier must be assigned to each item (function or piece-part), and to each failure mode of each item.

Failure effects analysis

Failure effects are determined and entered for each row of the FMECA matrix, considering the criteria identified in the ground rules. Effects are separately described for the local, next higher, and end (system) levels. System level effects may include:

The failure effect categories used at various hierarchical levels are tailored by the analyst using engineering judgment.

Severity classification

Severity classification is assigned for each failure mode of each unique item and entered on the FMECA matrix, based upon system level consequences. A small set of classifications, usually having 3 to 10 severity levels, is used. For example, When prepared using MILSTD1629A, failure or mishap severity classification normally follows MILSTD882. [14]

Mishap Severity Categories (MILSTD882)
CategoryDescriptionCriteria
ICatastrophicCould result in death, permanent total disability, loss exceeding $1M, or irreversible severe environmental damage that violates law or regulation.
IICriticalCould result in permanent partial disability, injuries or occupational illness that may result in hospitalization of at least three personnel, loss exceeding $200K but less than $1M, or reversible environmental damage causing a violation of law or regulation.
IIIMarginalCould result in injury or occupational illness resulting in one or more lost work day(s), loss exceeding $10K but less than $200K, or mitigable environmental damage without violation of law or regulation where restoration activities can be accomplished.
IVNegligibleCould result in injury or illness not resulting in a lost work day, loss exceeding $2K but less than $10K, or minimal environmental damage not violating law or regulation.

Current FMECA severity categories for U.S. Federal Aviation Administration (FAA), NASA and European Space Agency space applications are derived from MILSTD882. [15] [16] [17]

Failure detection methods

For each component and failure mode, the ability of the system to detect and report the failure in question is analyzed. One of the following will be entered on each row of the FMECA matrix:

Criticality ranking

Failure mode criticality assessment may be qualitative or quantitative. For qualitative assessment, a mishap probability code or number is assigned and entered on the matrix. For example, MILSTD882 uses five probability levels:

Failure Probability Levels (MILSTD882)
DescriptionLevelIndividual ItemFleet
FrequentALikely to occur often in the life of the itemContinuously experienced
ProbableBWill occur several times in the life of an itemWill occur frequently
OccasionalCLikely to occur some time in the life of an itemWill occur several times
RemoteDUnlikely but possible to occur in the life of an itemUnlikely, but can reasonably be expected to occur
ImprobableESo unlikely, it can be assumed occurrence may not be experiencedUnlikely to occur, but possible

The failure mode may then be charted on a criticality matrix using severity code as one axis and probability level code as the other. For quantitative assessment, modal criticality number is calculated for each failure mode of each item, and item criticality number is calculated for each item. The criticality numbers are computed using the following values:

The criticality numbers are computed as and . The basic failure rate is usually fed into the FMECA from a failure rate prediction based on MILHDBK217, PRISM, RIAC 217Plus, or a similar model. The failure mode ratio may be taken from a database source such as RAC FMD97. For functional level FMECA, engineering judgment may be required to assign failure mode ratio. The conditional probability number represents the conditional probability that the failure effect will result in the identified severity classification, given that the failure mode occurs. It represents the analyst's best judgment as to the likelihood that the loss will occur. For graphical analysis, a criticality matrix may be charted using either or on one axis and severity code on the other.

Critical item/failure mode list

Once the criticality assessment is completed for each failure mode of each item, the FMECA matrix may be sorted by severity and qualitative probability level or quantitative criticality number. This enables the analysis to identify critical items and critical failure modes for which design mitigation is desired.

Recommendations

After performing FMECA, recommendations are made to design to reduce the consequences of critical failures. This may include selecting components with higher reliability, reducing the stress level at which a critical item operates, or adding redundancy or monitoring to the system.

Maintainability analysis

FMECA usually feeds into both Maintainability Analysis and Logistics Support Analysis, which both require data from the FMECA. FMECA is the most popular tool for failure and criticality analysis of systems for performance enhancement. In the present era of Industry 4.0, the industries are implementing a predictive maintenance strategy for their mechanical systems. The FMECA is widely used for the failure mode identification and prioritization of mechanical systems and their subsystems for predictive maintenance. [18]

FMECA report

A FMECA report consists of system description, ground rules and assumptions, conclusions and recommendations, corrective actions to be tracked, and the attached FMECA matrix which may be in spreadsheet, worksheet, or database form.

Risk priority calculation

RAC CRTAFMECA and MILHDBK338 both identify Risk Priority Number (RPN) calculation as an alternate method to criticality analysis. The RPN is a result of a multiplication of detectability (D) x severity (S) x occurrence (O). With each on a scale from 1 to 10, the highest RPN is 10x10x10 = 1000. This means that this failure is not detectable by inspection, very severe and the occurrence is almost sure. If the occurrence is very sparse, this would be 1 and the RPN would decrease to 100. So, criticality analysis enables to focus on the highest risks.

Advantages and disadvantages

Strengths of FMECA include its comprehensiveness, the systematic establishment of relationships between failure causes and effects, and its ability to point out individual failure modes for corrective action in design.

Weaknesses include the extensive labor required, the large number of trivial cases considered, and inability to deal with multiple-failure scenarios or unplanned cross-system effects such as sneak circuits.

According to an FAA research report for commercial space transportation,

Failure Modes, effects, and Criticality Analysis is an excellent hazard analysis and risk assessment tool, but it suffers from other limitations. This alternative does not consider combined failures or typically include software and human interaction considerations. It also usually provides an optimistic estimate of reliability. Therefore, FMECA should be used in conjunction with other analytical tools when developing reliability estimates. [19]

See also

Related Research Articles

<span class="mw-page-title-main">Safety engineering</span> Engineering discipline which assures that engineered systems provide acceptable levels of safety

Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail.

In reliability engineering, the term availability has the following meanings:

<span class="mw-page-title-main">Configuration management</span> Process for maintaining consistency of a product attributes with its design

Configuration management (CM) is a systems engineering process for establishing and maintaining consistency of a product's performance, functional, and physical attributes with its requirements, design, and operational information throughout its life. The CM process is widely used by military engineering organizations to manage changes throughout the system lifecycle of complex systems, such as weapon systems, military vehicles, and information systems. Outside the military, the CM process is also used with IT service management as defined by ITIL, and with other domain models in the civil engineering and other industrial engineering segments such as roads, bridges, canals, dams, and buildings.

<span class="mw-page-title-main">Fault tree analysis</span> Failure analysis system used in safety engineering and reliability engineering

Fault tree analysis (FTA) is a type of failure analysis in which an undesired state of a system is examined. This analysis method is mainly used in safety engineering and reliability engineering to understand how systems can fail, to identify the best ways to reduce risk and to determine event rates of a safety accident or a particular system level (functional) failure. FTA is used in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other high-hazard industries; but is also used in fields as diverse as risk factor identification relating to social service system failure. FTA is also used in software engineering for debugging purposes and is closely related to cause-elimination technique used to detect bugs.

Failure mode and effects analysis is the process of reviewing as many components, assemblies, and subsystems as possible to identify potential failure modes in a system and their causes and effects. For each component, the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet. There are numerous variations of such worksheets. An FMEA can be a qualitative analysis, but may be put on a quantitative basis when mathematical failure rate models are combined with a statistical failure mode ratio database. It was one of the first highly structured, systematic techniques for failure analysis. It was developed by reliability engineers in the late 1950s to study problems that might arise from malfunctions of military systems. An FMEA is often the first step of a system reliability study.

Failure rate is the frequency with which an engineered system or component fails, expressed in failures per unit of time. It is usually denoted by the Greek letter λ (lambda) and is often used in reliability engineering.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

Integrated logistics support (ILS) is a technology in the system engineering to lower a product life cycle cost and decrease demand for logistics by the maintenance system optimization to ease the product support. Although originally developed for military purposes, it is also widely used in commercial customer service organisations.

Environmental stress screening (ESS) refers to the process of exposing a newly manufactured or repaired product or component to stresses such as thermal cycling and vibration in order to force latent defects to manifest themselves by permanent or catastrophic failure during the screening process. The surviving population, upon completion of screening, can be assumed to have a higher reliability than a similar unscreened population.

A hazard analysis is used as the first step in a process used to assess risk. The result of a hazard analysis is the identification of different types of hazards. A hazard is a potential condition and exists or not. It may, in single existence or in combination with other hazards and conditions, become an actual Functional Failure or Accident (Mishap). The way this exactly happens in one particular sequence is called a scenario. This scenario has a probability of occurrence. Often a system has many potential failure scenarios. It also is assigned a classification, based on the worst case severity of the end condition. Risk is the combination of probability and severity. Preliminary risk levels can be provided in the hazard analysis. The validation, more precise prediction (verification) and acceptance of risk is determined in the risk assessment (analysis). The main goal of both is to provide the best selection of means of controlling or eliminating the risk. The term is used in several engineering specialties, including avionics, food safety, occupational safety and health, process safety, reliability engineering.

<span class="mw-page-title-main">ARP4761</span>

ARP4761, Guidelines for Conducting the Safety Assessment Process on Civil Aircraft, Systems, and Equipment is an Aerospace Recommended Practice from SAE International. In conjunction with ARP4754, ARP4761 is used to demonstrate compliance with 14 CFR 25.1309 in the U.S. Federal Aviation Administration (FAA) airworthiness regulations for transport category aircraft, and also harmonized international airworthiness regulations such as European Aviation Safety Agency (EASA) CS–25.1309.

Reliability-centered maintenance (RCM) is a concept of maintenance planning to ensure that systems continue to do what their user require in their present operating context. Successful implementation of RCM will lead to increase in cost effectiveness, reliability, machine uptime, and a greater understanding of the level of risk that the organization is managing.

Eight Disciplines Methodology (8D) is a method or model developed at Ford Motor Company used to approach and to resolve problems, typically employed by quality engineers or other professionals. Focused on product and process improvement, its purpose is to identify, correct, and eliminate recurring problems. It establishes a permanent corrective action based on statistical analysis of the problem and on the origin of the problem by determining the root causes. Although it originally comprised eight stages, or 'disciplines', it was later augmented by an initial planning stage. 8D follows the logic of the PDCA cycle. The disciplines are:

Engineering analysis involves the application of scientific/mathematical analytic principles and processes to reveal the properties and state of a system, device or mechanism under study.

Worst-case circuit analysis is a cost-effective means of screening a design to ensure with a high degree of confidence that potential defects and deficiencies are identified and eliminated prior to and during test, production, and delivery.

Reliability of a semiconductor device is the ability of the device to perform its intended function during the life of the device in the field.

Functional safety is the part of the overall safety of a system or piece of equipment that depends on automatic protection operating correctly in response to its inputs or failure in a predictable manner (fail-safe). The automatic protection system should be designed to properly handle likely systematic errors, hardware failures and operational/environmental stress.

Bent pin analysis is a special kind of failure mode and effect analysis (FMEA) performed on electrical connectors, and by extension it can also be used for FMEA of interface wiring. This analysis is generally applicable to mission-critical and safety-critical systems and is particularly applicable to aircraft, where failures of low-tech items such as wiring can and sometimes do affect safety.

ISO 13849 is a safety standard which applies to parts of machinery control systems that are assigned to providing safety functions. The standard is one of a group of sector-specific functional safety standards that were created to tailor the generic system reliability approaches, e.g., IEC 61508, MIL-HDBK-217, MIL-HDBK-338, to the needs of a particular sector. ISO 13849 is simplified for use in the machinery sector.

Failure modes, effects, and diagnostic analysis (FMEDA) is a systematic analysis technique to obtain subsystem / product level failure rates, failure modes and diagnostic capability. The FMEDA technique considers:

References

  1. Procedures for Performing a Failure Mode Effects and Criticality Analysis. U.S. Department of Defense. 1949. MILP1629.
  2. Neal, R.A. (1962). Modes of Failure Analysis Summary for the Nerva B-2 Reactor (pdf). Westinghouse Electric Corporation Astronuclear Laboratory. hdl:2060/19760069385. WANLTNR042. Retrieved 2010-03-13.
  3. Dill, Robert; et al. (1963). State of the Art Reliability Estimate of Saturn V Propulsion Systems (pdf). General Electric Company. hdl:2060/19930075105. RM 63TMP22. Retrieved 2010-03-13.
  4. Procedure for Failure Mode, Effects and Criticality Analysis (FMECA) (pdf). National Aeronautics and Space Administration. 1966. hdl:2060/19700076494. RA0060131A. Retrieved 2010-03-13.
  5. Failure Modes, Effects, and Criticality Analysis (FMECA) (PDF). National Aeronautics and Space Administration JPL. PDAD1307. Retrieved 2010-03-13.
  6. Borgovini, Robert; Pemberton, S.; Rossi, M. (1993). Failure Mode, Effects and Criticality Analysis (FMECA). B. Reliability Analysis Center. p. 5. CRTAFMECA. Archived from the original (pdf) on 2011-06-04. Retrieved 2010-03-03.
  7. Design Analysis Procedure For Failure Modes, Effects and Criticality Analysis (FMECA). Society for Automotive Engineers. 1967. ARP926.
  8. Analysis techniques for system reliability – Procedure for failure mode and effects analysis (FMEA) (PDF). International Electrotechnical Commission. 1985. IEC 812. Retrieved 2013-08-08.
  9. Reliability of Systems, Equipment and Components Part 5: Guide to Failure Modes, Effects and Criticality Analysis (FMEA and FMECA). British Standards Institute. 1991. BS 57605.
  10. Procedures for Performing a Failure Mode, Effects and Criticaility Analysis. A. U.S. Department of Defense. 1980. MILHDBK1629A. Archived from the original (pdf) on 2011-07-22. Retrieved 2010-03-14.
  11. "7.8 Failure Mode and Effects Analysis (FMEA)". Electronic Reliability Design Handbook. B. U.S. Department of Defense. 1998. MILHDBK338B. Archived from the original (pdf) on 2011-07-22. Retrieved 2010-03-13.
  12. Chandler, Gregory; Denson, W.; Rossi, M.; Wanner, R. (1991). Failure Mode/Mechanism Distributions (PDF). Reliability Analysis Center. FMD91. Archived from the original (pdf) on 2019-09-04. Retrieved 2010-03-14.
  13. Failure Mode/Mechanism Distributions. Reliability Analysis Center. 1997. FMD97.
  14. Standard Practice for System Safety. D. U.S. Department of Defense. 1998. MILHDBK882D. Archived from the original (pdf) on 2011-07-22. Retrieved 2010-03-14.
  15. NASA Systems Engineering Handbook (PDF). National Aeronautics and Space Administration. SP610S.
  16. Failure Modes, Effects and Criticality Analysis (FMECA). D. European Space Agency. 1991. ECSSQ3002A.
  17. Reusable Launch and Reentry Vehicle System Safety Processes (PDF). Federal Aviation Administration. 2005. AC 431.352A. Archived from the original (PDF) on 2017-02-10. Retrieved 2010-03-14.
  18. Thoppil, Nikhil M.; Vasu, V.; Rao, C. S. P. (27 August 2019). "Failure Mode Identification and Prioritization Using FMECA: A Study on Computer Numerical Control Lathe for Predictive Maintenance". Journal of Failure Analysis and Prevention. 19 (4): 1153–1157. doi:10.1007/s11668-019-00717-8. ISSN   1864-1245. S2CID   201750563.
  19. Research and Development Accomplishments FY 2004 (PDF). Federal Aviation Administration. 2004. Retrieved 2010-03-14.