High-temperature operating life (HTOL) is a reliability test applied to integrated circuits (ICs) to determine their intrinsic reliability. This test stresses the IC at an elevated temperature, high voltage and dynamic operation for a predefined period of time. The IC is usually monitored under stress and tested at intermediate intervals. This reliability stress test is sometimes referred to as a lifetime test, device life test or extended burn in test and is used to trigger potential failure modes and assess IC lifetime.
There are several types of HTOL:
HTOL Types | Schematic | Description |
---|---|---|
Static | IC stressed at static and constant conditions, IC not toggling. | |
Dynamic | Input stimulus for toggling the internal nodes of the device. | |
Monitored | Input stimulus for toggling the internal nodes of the device. Live output indicates IC performance. | |
In-situ tested | Input stimulus for toggling the internal nodes of the device. Responsive output tests IC performance. |
The main aim of the HTOL is to age the device such that a short experiment will allow the lifetime of the IC to be predicted (e.g. 1,000 HTOL hours shall predict a minimum of "X" years of operation). Good HTOL process shall avoid relaxed HTOL operation and also prevents overstressing the IC. This method ages all IC's building blocks to allow relevant failure modes to be triggered and implemented in a short reliability experiment. A precise multiplier, known as the Acceleration Factor (AF) simulates long lifetime operation.
The AF represents the accelerated aging factor relative to the useful life application conditions.
For effective HTOL stress testing, several variables should be considered:
A detailed description of the above variables, using a hypothetical, simplified IC with several RAMs, digital logic, an analog voltage regulator module and I/O ring, together with the HTOL design considerations for each are provided below.
The digital toggling factor (DTF) represents the number of transistors that change their state during the stress test, relative to the total number of gates in the digital portion of the IC. In effect, the DTF is the percentage of transistors toggling in one time unit. The time unit is relative to the toggling frequency, and is usually limited by the HTOL setup to be in the range of 10–20Mhz.
Reliability engineers strive to toggle as many as possible transistors for each time unit of measure. The RAMs (and other memory types) are usually activated using the BIST function, while the logic is usually activated with the SCAN function, LFSR or logic BIST.
The power and the self-heating of the digital portion of the IC are evaluated and the device's aging estimated. These two measures are aligned so that they are similar to the aging of other elements of the IC. The degrees of freedom for aligning these measures are the voltage stress and/or the time period during which the HTOL program loops these blocks relative to other IC blocks.
The recent trend of integrating as many electronic components as possible into a single chip is known as system on a chip (SoC).
This trend complicates reliability engineers' work because (usually) the analog portion of the chip dissipates higher power relative to the other IC elements.
This higher power may generate hot spots and areas of accelerated aging. Reliability engineers must understand the power distribution on the chip and align the aging so that it is similar for all elements of an IC.
In our hypothetical SoC the analog module only includes a voltage regulator. In reality, there may be additional analog modules e.g. PMIC, oscillators, or charge pumps. To perform efficient stress tests on the analog elements, reliability engineers must identify the worst-case scenario for the relevant analog blocks in the IC. For example, the worst-case scenario for voltage regulators may be the maximum regulation voltage and maximum load current; for charge pumps it may be the minimum supply voltage and maximum load current.
Good engineering practice calls for the use of external loads (external R, L, C) to force the necessary currents. This practice avoids loading differences due to the chip's different operational schemes and operation trimming of its analog parts.
Statistical methods are used to check statistical tolerances, variation and temperature stability of the loads used, and to define the right confidence bands for the loads to avoid over/under stress at HTOL operating range. The degrees of freedom for aligning the aging magnitude of analog parts is usually the duty-cycle, external load values and voltage stress.
The interface between the "outside world" and the IC is made via the input/output (I/O) ring. This ring contains power I/O ports, digital I/O ports and analog I/O ports. The I/Os are (usually) wired via the IC package to the "outside world" and each I/O executes its own specific command instructions, e.g. JTAG ports, IC power supply ports etc. Reliability engineering aims to age all I/Os in the same way as the other IC elements. This can be achieved by using a Boundary scan operation.
As previously mentioned, the main aim of the HTOL is aging the samples by dynamic stress at elevated voltage and/or temperature. During the HTOL operation, we need to assure that the IC is active, toggling and constantly functioning.
At the same time, we need to know at what point the IC stops responding, these data are important for calculating price reliability indices and for facilitating the FA. This is done by monitoring the device via one or more vital IC parameters signals communicated and logged by the HTOL machine and providing continuous indication about the IC's functionality throughout the HTOL run time. Examples of commonly used monitors include the BIST "done" flag signal, the SCAN output chain or the analog module output.
There are three types of monitoring:
According to JEDEC standards, the environmental chamber should be capable of maintaining the specified temperature within a tolerance of ±5 °C throughout while parts are loaded and unpowered. Today's environmental chambers have better capabilities and can exhibit temperature stability within a range of ±3 °C throughout.
Low power ICs can be stressed without major attention to self-heating effects. However, due to technology scaling and manufacturing variations, power dissipation within a single production lot of devices can vary by as much as 40%. This variation, in addition to high power IC makes advanced contact temperature controls necessary for facilitating individual control systems for each IC
The operating voltage should be at least the maximum specified for the device. In some cases a higher voltage is applied to obtain lifetime acceleration from voltage as well as temperature.
To define the maximum permitted voltage stress, the following methods can be considered:
Reliability engineers must check that Vstress does not exceed the maximum rated voltage for the relevant technology, as specified by the FAB.
The acceleration factor (AF) is a multiplier that relates a product's life at an accelerated stress level to the life at the use stress level.
An AF of 20 means 1 hour at stress condition is equivalent to 20 hours at useful condition.
The voltage acceleration factor is represented by AFv. Usually the stress voltage is equal to or higher than the maximum voltage. An elevated voltage provides additional acceleration and can be used to increase effective device hours or achieve an equivalent life point.
There are several AFv models:
AFtemp is the acceleration factor due to changes in temperature and is usually based on the Arrhenius equation. The total acceleration factor is the product of AFv and AFtemp
The reliability test duration assures the device's adequate lifetime requirement.
For example, with an activation energy of 0.7 eV, 125 °C stress temperature and 55 °C use temperature, the acceleration factor (Arrhenius equation) is 78.6. This means that 1,000 hours' stress duration is equivalent to 9 years of use. The reliability engineer decides on the qualification test duration. Industry good practice calls for 1,000 hours at a junction temperature of 125 °C.
The challenge for new reliability assessment and qualification systems is determining the relevant failure mechanisms to optimize sample size.
Sample plans are statistically derived from manufacturer risk, consumer risk, and the expected failure rate. The commonly used sampling plan of zero rejects out of 230 samples is equal to three rejects out of 668 samples assuming LTPD = 1 and a 90% confidence interval.
Samples shall include representative samples from at least three nonconsecutive lots to represent manufacturing variability. All test samples shall be fabricated, handled, screened and assembled in the same way as during the production phase.
Samples shall be tested prior to stress and at predefined checkpoints. It is good engineering practice to test samples at maximum and minimum rating temperatures as well as at room temperature. Data logs of all functional and parametric tests shall be collated for further analysis.
Assuming Tj = 125 °C, commonly used checkpoints are after 48, 168, 500 and 1,000 hours.
Different checkpoints for different temperatures can be calculated by using the Arrhenius equation. For example, with an activation energy of 0.7e V, Tj of 135 °C and Tuse of 55 °C the equivalent checkpoints will be at 29, 102, 303 and 606 hours.
Electrical testing should be completed as soon as possible after the samples are removed. If the samples cannot be tested soon after their removal, additional stress time should be applied. The JEDEC standard requires samples be tested within 168 hours of removal.
If testing exceeds the recommended time window, additional stress should be applied according to the table below: [2]
Time above recommended time window | 0 h < h ≤ 168 h | 168 h < h ≤ 336 h | 336 h < h ≤ 504 h | Other |
---|---|---|---|---|
Additional stress hours | 24 h | 48 h | 72 h | 24 hours for each 168 hours |
The merit number is the outcome of statistical sampling plans.
Sampling plans are inputted to SENTENCE, an audit tool, to ensure that the output of a process meets the requirements. SENTENCE simply accepts or rejects the tested lots. The reliability engineer implements statistical sampling plans based on predefined Acceptance Quality Limits, LTPD, manufacturer risk and customer risk. For example, the commonly used sampling plan of 0 rejects out of 230 samples is equal to 3 rejects out of 668 samples assuming LTPD = 1.
The aging process of an IC is relative to its standard use conditions. The tables below provide reference to various commonly used products and the conditions under which they are used.
Reliability engineers are tasked with verifying the adequate stress duration. For example, for an activation energy of 0.7 eV, a stress temperature of 125 °C and a use temperature of 55 °C, an expected operational life of five years is represented by a 557-hour HTOL experiment.
Min Tuse | Max Tuse | Description | Expected life time |
---|---|---|---|
5 °C | 50 °C | desktop products | 5 years |
0 °C | 70 °C | mobile products | 4 years |
Example Automotive Use Conditions [1]
Min Tuse | Max Tuse | Description | Expected life time |
---|---|---|---|
−40 °C | 105—150 °C | under hood condition | 10 to 15 years |
−40 °C | 80 °C | passenger compartment condition | 10 to 15 years |
0 °C | 70 °C | passenger compartment condition | 10 to 15 years |
Example European Telecom use Conditions definition
Min Tuse | Max Tuse | Description | Expected life time |
---|---|---|---|
5 °C | 40 °C | class 3.1 Temperature-controlled locations | usually 25 years |
−5 °C | 45 °C | class 3.2 Partly temperature-controlled locations | usually 25 years |
−25 °C | 55 °C | class 3.3 Not temperature-controlled locations | usually 25 years |
−40 °C | 70 °C | class 3.4 Sites with heat-trap | usually 25 years |
−40 °C | 40 °C | class 3.5 Sheltered locations, Direct solar radiation | usually 25 years |
Example US Telecom use conditions definition
Min Tuse | Max Tuse | Description | Expected life time |
---|---|---|---|
−40 °C | 46 °C | Uncontrolled environment | 25 years |
5 °C | 40 °C | Enclosed building | 25 years |
Example military use conditions
Min Tuse | Max Tuse | Description |
---|---|---|
−55 °C | 125 °C | MIL products |
−55 °C | up to 225 °C | high-temp applications |
Number of Failures = r
Number of Devices = D
Test Hours per Device = H
Celsius + 273 = T (Calculation Temperature in Kelvin)
Test Temperature (HTRB or other burn-in temperature)=
Use Temperature (standardized at 55 °C or 328K) =
Activation Energy (eV) =
Chi Squared/2 is the probability estimation for number of failures at α and ν
Acceleration Factor from the Arrhenius equation =
Boltzmann constant () = 8.617×10−5 eV/K
Device Hours (DH) = D × H
Equivalent Device Hours (EDH) = D × H ×
Failure Rate per hour =
Failures in Time = Failure Rate per billion hours = FIT =
Mean Time to Failure = MTTF
Where the Acceleration Factor from the Arrhenius equation is:
Failure Rate per hour =
Failures in Time = Failure Rate per billion hours = FIT =
Mean Time to Failure in hours =
Mean Time to Failure in years = ´
In case you want to calculate the acceleration factor including the Humidity the so-called Highly accelerated stress test (HAST), then:
the Acceleration Factor from the Arrhenius equation would be:
where is the stress test relative humidity (in percentage). Typically is 85%.
where is the typical use relative humidity (in percentage). Typically this is measured at the chip surface ca. 10–20%.
where is the failure mechanism scale factor. Which is a value between 0.1 and 0.15.
In case you want to calculate the acceleration factor including the Humidity (HAST) and voltage stress then:
the Acceleration Factor from the Arrhenius equation would be:
where is the stress voltage (in volts). Typically is the VCCx1.4 volts. e.g. 1.8x1.4=2.52 volts.
where is the typical usage voltage or VCC (in volts). Typically VCC is 1.8v. Depending on the design.
where is the failure mechanism scale factor. Which is a value between 0 and 3.0. Typically 0.5 for Silican junction defect.
A scanning tunneling microscope (STM) is a type of scanning probe microscope used for imaging surfaces at the atomic level. Its development in 1981 earned its inventors, Gerd Binnig and Heinrich Rohrer, then at IBM Zürich, the Nobel Prize in Physics in 1986. STM senses the surface by using an extremely sharp conducting tip that can distinguish features smaller than 0.1 nm with a 0.01 nm (10 pm) depth resolution. This means that individual atoms can routinely be imaged and manipulated. Most scanning tunneling microscopes are built for use in ultra-high vacuum at temperatures approaching absolute zero, but variants exist for studies in air, water and other environments, and for temperatures over 1000 °C.
In physics, Wien's displacement law states that the black-body radiation curve for different temperatures will peak at different wavelengths that are inversely proportional to the temperature. The shift of that peak is a direct consequence of the Planck radiation law, which describes the spectral brightness or intensity of black-body radiation as a function of wavelength at any given temperature. However, it had been discovered by German physicist Wilhelm Wien several years before Max Planck developed that more general equation, and describes the entire shift of the spectrum of black-body radiation toward shorter wavelengths as temperature increases.
In physical chemistry, the Arrhenius equation is a formula for the temperature dependence of reaction rates. The equation was proposed by Svante Arrhenius in 1889, based on the work of Dutch chemist Jacobus Henricus van 't Hoff who had noted in 1884 that the van 't Hoff equation for the temperature dependence of equilibrium constants suggests such a formula for the rates of both forward and reverse reactions. This equation has a vast and important application in determining the rate of chemical reactions and for calculation of energy of activation. Arrhenius provided a physical justification and interpretation for the formula. Currently, it is best seen as an empirical relationship. It can be used to model the temperature variation of diffusion coefficients, population of crystal vacancies, creep rates, and many other thermally induced processes and reactions. The Eyring equation, developed in 1935, also expresses the relationship between rate and energy.
Ohm's law states that the electric current through a conductor between two points is directly proportional to the voltage across the two points. Introducing the constant of proportionality, the resistance, one arrives at the three mathematical equations used to describe this relationship:
Mean time between failures (MTBF) is the predicted elapsed time between inherent failures of a mechanical or electronic system during normal system operation. MTBF can be calculated as the arithmetic mean (average) time between failures of a system. The term is used for repairable systems while mean time to failure (MTTF) denotes the expected time to failure for a non-repairable system.
Absorbance is defined as "the logarithm of the ratio of incident to transmitted radiant power through a sample ". Alternatively, for samples which scatter light, absorbance may be defined as "the negative logarithm of one minus absorptance, as measured on a uniform sample". The term is used in many technical areas to quantify the results of an experimental measurement. While the term has its origin in quantifying the absorption of light, it is often entangled with quantification of light which is “lost” to a detector system through other mechanisms. What these uses of the term tend to have in common is that they refer to a logarithm of the ratio of a quantity of light incident on a sample or material to that which is detected after the light has interacted with the sample.
A current mirror is a circuit designed to copy a current through one active device by controlling the current in another active device of a circuit, keeping the output current constant regardless of loading. The current being "copied" can be, and sometimes is, a varying signal current. Conceptually, an ideal current mirror is simply an ideal inverting current amplifier that reverses the current direction as well, or it could consist of a current-controlled current source (CCCS). The current mirror is used to provide bias currents and active loads to circuits. It can also be used to model a more realistic current source.
Electromigration is the transport of material caused by the gradual movement of the ions in a conductor due to the momentum transfer between conducting electrons and diffusing metal atoms. The effect is important in applications where high direct current densities are used, such as in microelectronics and related structures. As the structure size in electronics such as integrated circuits (ICs) decreases, the practical significance of this effect increases.
Failure rate is the frequency with which an engineered system or component fails, expressed in failures per unit of time. It is usually denoted by the Greek letter λ (lambda) and is often used in reliability engineering.
Thermal shock is a phenomenon characterized by a rapid change in temperature that results in a transient mechanical load on an object. The load is caused by the differential expansion of different parts of the object due to the temperature change. This differential expansion can be understood in terms of strain, rather than stress. When the strain exceeds the tensile strength of the material, it can cause cracks to form, and eventually lead to structural failure.
Optical resolution describes the ability of an imaging system to resolve detail, in the object that is being imaged. An imaging system may have many individual components, including one or more lenses, and/or recording and display components. Each of these contributes to the optical resolution of the system; the environment in which the imaging is done often is a further important factor.
Delamination is a mode of failure where a material fractures into layers. A variety of materials, including laminate composites and concrete, can fail by delamination. Processing can create layers in materials, such as steel formed by rolling and plastics and metals from 3D printing which can fail from layer separation. Also, surface coatings, such as paints and films, can delaminate from the coated substrate.
Channel length modulation (CLM) is an effect in field effect transistors, a shortening of the length of the inverted channel region with increase in drain bias for large drain biases. The result of CLM is an increase in current with drain bias and a reduction of output resistance. It is one of several short-channel effects in MOSFET scaling. It also causes distortion in JFET amplifiers.
A stochastic simulation is a simulation of a system that has variables that can change stochastically (randomly) with individual probabilities.
Black's Equation is a mathematical model for the mean time to failure (MTTF) of a semiconductor circuit due to electromigration: a phenomenon of molecular rearrangement (movement) in the solid phase caused by an electromagnetic field.
The linear attenuation coefficient, attenuation coefficient, or narrow-beam attenuation coefficient characterizes how easily a volume of material can be penetrated by a beam of light, sound, particles, or other energy or matter. A coefficient value that is large represents a beam becoming 'attenuated' as it passes through a given medium, while a small value represents that the medium had little effect on loss. The (derived) SI unit of attenuation coefficient is the reciprocal metre (m−1). Extinction coefficient is another term for this quantity, often used in meteorology and climatology. Most commonly, the quantity measures the exponential decay of intensity, that is, the value of downward e-folding distance of the original intensity as the energy of the intensity passes through a unit thickness of material, so that an attenuation coefficient of 1 m−1 means that after passing through 1 metre, the radiation will be reduced by a factor of e, and for material with a coefficient of 2 m−1, it will be reduced twice by e, or e2. Other measures may use a different factor than e, such as the decadic attenuation coefficient below. The broad-beam attenuation coefficient counts forward-scattered radiation as transmitted rather than attenuated, and is more applicable to radiation shielding. The mass attenuation coefficient is the attenuation coefficient normalized by the density of the material.
Accelerated life testing is the process of testing a product by subjecting it to conditions in excess of its normal service parameters in an effort to uncover faults and potential modes of failure in a short amount of time. By analyzing the product's response to such tests, engineers can make predictions about the service life and maintenance intervals of a product.
Physics of failure is a technique under the practice of reliability design that leverages the knowledge and understanding of the processes and mechanisms that induce failure to predict reliability and improve product performance.
Software reliability testing is a field of software-testing that relates to testing a software's ability to function, given environmental conditions, for a particular amount of time. Software reliability testing helps discover many problems in the software design and functionality.
Solder fatigue is the mechanical degradation of solder due to deformation under cyclic loading. This can often occur at stress levels below the yield stress of solder as a result of repeated temperature fluctuations, mechanical vibrations, or mechanical loads. Techniques to evaluate solder fatigue behavior include finite element analysis and semi-analytical closed-form equations.