Reliability prediction for electronic components

Last updated

A prediction of reliability is an important element in the process of selecting equipment for use by telecommunications service providers and other buyers of electronic equipment, and it is essential during the design stage of engineering systems life cycle. [1] Reliability is a measure of the frequency of equipment failures as a function of time. Reliability has a major impact on maintenance and repair costs and on the continuity of service. [2]

Contents

Every product has a failure rate, λ which is the number of units failing per unit time. This failure rate changes throughout the life of the product. It is the manufacturer’s aim to ensure that product in the “infant mortality period” does not get to the customer. This leaves a product with a useful life period during which failures occur randomly i.e., λ is constant, and finally a wear-out period, usually beyond the products useful life, where λ is increasing.

Definition of reliability

A practical definition of reliability is “the probability that a piece of equipment operating under specified conditions shall perform satisfactorily for a given period of time”. The reliability is a number between 0 and 1 respectively.

MTBF and MTTF

MTBF (mean operating time between failures) applies to equipment that is going to be repaired and returned to service, MTTF (mean time to failure) applies to parts that will be thrown away on failing. During the ‘useful life period’ assuming a constant failure rate, MTBF is the inverse of the failure rate and the terms can be used interchangeably.

Importance of reliability prediction

Reliability predictions:

  • Help assess the effect of product reliability on the maintenance activity and on the quantity of spare units required for acceptable field performance of any particular system. For example, predictions of the frequency of unit level maintenance actions can be obtained. Reliability prediction can be used to size spare populations.
  • Provide necessary input to system-level reliability models. System-level reliability models can subsequently be used to predict, for example, frequency of system outages in steady-state, frequency of system outages during early life, expected downtime per year, and system availability.
  • Provide necessary input to unit and system-level life cycle cost analyses. Life cycle cost studies determine the cost of a product over its entire life. Therefore, how often a unit will have to be replaced needs to be known. Inputs to this process include unit and system failure rates. This includes how often units and systems fail during the first year of operation as well as in later years.
  • Assist in deciding which product to purchase from a list of competing products. As a result, it is essential that reliability predictions be based on a common procedure.
  • Can be used to set factory test standards for products requiring a reliability test. Reliability predictions help determine how often the system should fail.
  • Are needed as input to the analysis of complex systems such as switching systems and digital cross-connect systems. It is necessary to know how often different parts of the system are going to fail even for redundant components.
  • Can be used in design trade-off studies. For example, a supplier could look at a design with many simple devices and compare it to a design with fewer devices that are newer but more complex. The unit with fewer devices is usually more reliable.
  • Can be used to set achievable in-service performance standards against which to judge actual performance and stimulate action.

The telecommunications industry has devoted much time over the years to concentrate on developing reliability models for electronic equipment. One such tool is the automated reliability prediction procedure (ARPP), which is an Excel-spreadsheet software tool that automates the reliability prediction procedures in SR-332, Reliability prediction procedure for electronic equipment. FD-ARPP-01 provides suppliers and manufacturers with a tool for making reliability prediction procedure (RPP) calculations. It also provides a means for understanding RPP calculations through the capability of interactive examples provided by the user.

The RPP views electronic systems as hierarchical assemblies. Systems are constructed from units that, in turn, are constructed from devices. The methods presented predict reliability at these three hierarchical levels:

  1. Device: A basic component (or part)
  2. Unit: Any assembly of devices. This may include, but is not limited to, circuit packs, modules, plug-in units, racks, power supplies, and ancillary equipment. Unless otherwise dictated by maintenance considerations, a unit will usually be the lowest level of replaceable assemblies/devices. The RPP is aimed primarily at reliability prediction of units.
  3. Serial System: Any assembly of units for which the failure of any single unit will cause a failure of the system.

Related Research Articles

In reliability theory and reliability engineering, the term availability has the following meanings:

Mean time between failures (MTBF) is the predicted elapsed time between inherent failures of a mechanical or electronic system, during normal system operation. MTBF can be calculated as the arithmetic mean (average) time between failures of a system. The term is used for repairable systems, while mean time to failure (MTTF) denotes the expected time to failure for a non-repairable system.

A product's service life is its period of use in service. It is mostly used in non-technical context, and has no scientific support or meaning. Several other terms more accurately describe a product's life, from the point of manufacture, storage, and distribution, and eventual use.

Annualized failure rate (AFR) gives the estimated probability that a device or component will fail during a full year of use. It is a relation between the mean time between failure (MTBF) and the hours that a number of devices are run per year. AFR is estimated from a sample of like components—AFR and MTBF as given by vendors are population statistics that can not predict the behaviour of an individual unit.

Failure mode and effects analysis is the process of reviewing as many components, assemblies, and subsystems as possible to identify potential failure modes in a system and their causes and effects. For each component, the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet. There are numerous variations of such worksheets. An FMEA can be a qualitative analysis, but may be put on a quantitative basis when mathematical failure rate models are combined with a statistical failure mode ratio database. It was one of the first highly structured, systematic techniques for failure analysis. It was developed by reliability engineers in the late 1950s to study problems that might arise from malfunctions of military systems. An FMEA is often the first step of a system reliability study.

Failure rate is the frequency with which an engineered system or component fails, expressed in failures per unit of time. It is usually denoted by the Greek letter λ (lambda) and is often used in reliability engineering.

Reliability engineering is a sub-discipline of systems engineering that emphasizes dependability in the lifecycle management of a product. Reliability describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

Integrated logistic support (ILS) is an integrated and iterative process for developing materiel and a support strategy that optimizes functional support, leverages existing resources, and guides the system engineering process to quantify and lower life cycle cost and decrease the logistics footprint, making the system easier to support. Although originally developed for military purposes, it is also widely used in commercial product support or customer service organisations.

Electronic packaging is the design and production of enclosures for electronic devices ranging from individual semiconductor devices up to complete systems such as a mainframe computer. Packaging of an electronic system must consider protection from mechanical damage, cooling, radio frequency noise emission and electrostatic discharge. Product safety standards may dictate particular features of a consumer product, for example, external case temperature or grounding of exposed metal parts. Prototypes and industrial equipment made in small quantities may use standardized commercially available enclosures such as card cages or prefabricated boxes. Mass-market consumer devices may have highly specialized packaging to increase consumer appeal. Electronic packaging is a major discipline within the field of mechanical engineering.

Fides (Latin: trust) is a guide allowing estimated reliability calculation for electronic components and systems. The reliability prediction is generally expressed in FIT (number of failures for 109 hours) or MTBF (Mean Time Between Failures). This guide provides reliability data for RAMS (Reliability, Availability, Maintainability, Safety) studies.

Worst-case circuit analysis is a cost-effective means of screening a design to ensure with a high degree of confidence that potential defects and deficiencies are identified and eliminated prior to and during test, production, and delivery.

Reliability of semiconductor devices can be summarized as follows:

  1. Semiconductor devices are very sensitive to impurities and particles. Therefore, to manufacture these devices it is necessary to manage many processes while accurately controlling the level of impurities and particles. The finished product quality depends upon the many layered relationship of each interacting substance in the semiconductor, including metallization, chip material and package.
  2. The problems of micro-processes, and thin films and must be fully understood as they apply to metallization and wire bonding. It is also necessary to analyze surface phenomena from the aspect of thin films.
  3. Due to the rapid advances in technology, many new devices are developed using new materials and processes, and design calendar time is limited due to non-recurring engineering constraints, plus time to market concerns. Consequently, it is not possible to base new designs on the reliability of existing devices.
  4. To achieve economy of scale, semiconductor products are manufactured in high volume. Furthermore, repair of finished semiconductor products is impractical. Therefore, incorporation of reliability at the design stage and reduction of variation in the production stage have become essential.
  5. Reliability of semiconductor devices may depend on assembly, use, and environmental conditions. Stress factors affecting device reliability include gas, dust, contamination, voltage, current density, temperature, humidity, mechanical stress, vibration, shock, radiation, pressure, and intensity of magnetic and electrical fields.

A spare part, spare, service part, repair part, or replacement part, is an interchangeable part that is kept in an inventory and used for the repair or replacement of failed units. Spare parts are an important feature of logistics engineering and supply chain management, often comprising dedicated spare parts management systems.

Physics of failure is a technique under the practice of Design for Reliability that leverages the knowledge and understanding of the processes and mechanisms that induce failure to predict reliability and improve product performance.

Maintenance Philosophy is the mix of strategies that ensure an item works as expected when needed.

Software reliability testing is a field of software testing that relates to testing a software's ability to function, given environmental conditions, for a particular amount of time. Software reliability testing helps discover many problems in the software design and functionality.

Sherlock Automated Design Analysis™ is a software tool developed by DfR Solutions for analyzing, grading, and certifying the expected reliability of products at the circuit card assembly level. The software is designed for use by design and reliability engineers and managers in the electronics industry. Because of the modularity and broad use of electronics, Sherlock has applicability across industries such as automotive, alternative energy, components, consumer electronics, contract manufacturing, data and telecommunications, industrial/power, medical, military/avionics/space, and portables.

Robustness validation is a skills strategy with which the Robustness of a product to the loading conditions of a real application is proven and targeted statements about risks and reliability can be made. This strategy is particularly for use in the automotive industry however could be applied to any industry where high levels of reliability are required

Mean Time to Dangerous Failure. In a safety system MTTFD is the portion of failure modes that can lead to failures that may result in hazards to personnel, environment or equipment.

References

  1. EPSMA, “Guidelines to Understanding Reliability Predictions”, EPSMA, 2005
  2. Terry Donovan, Senior Systems Engineer Telcordia Technologies. Member of Optical Society of America, IEEE, "Automated Reliability Prediction, SR-332, Issue 3", January 2011; "Automated Reliability Prediction (ARPP), FD-ARPP-01, Issue 11", January 2011