Availability

Last updated

In reliability engineering, the term availability has the following meanings:

Contents

Normally high availability systems might be specified as 99.98%, 99.999% or 99.9996%.

Representation

The simplest representation of availability (A) is a ratio of the expected value of the uptime of a system to the aggregate of the expected values of up and down time (that results in the "total amount of time" C of the observation window)

Another equation for availability (A) is a ratio of the Mean Time To Failure (MTTF) and Mean Time Between Failure (MTBF), or

If we define the status function as

therefore, the availability A(t) at time t > 0 is represented by

Average availability must be defined on an interval of the real line. If we consider an arbitrary constant , then average availability is represented as

Limiting (or steady-state) availability is represented by [1]

Limiting average availability is also defined on an interval as,

Availability is the probability that an item will be in an operable and committable state at the start of a mission when the mission is called for at a random time, and is generally defined as uptime divided by total time (uptime plus downtime).

Methods and techniques to model availability

Reliability Block Diagrams or Fault Tree Analysis are developed to calculate availability of a system or a functional failure condition within a system including many factors like:

Furthermore, these methods are capable to identify the most critical items and failure modes or events that impact availability.

Definitions within systems engineering

Availability, inherent (Ai) [2] The probability that an item will operate satisfactorily at a given point in time when used under stated conditions in an ideal support environment. It excludes logistics time, waiting or administrative downtime, and preventive maintenance downtime. It includes corrective maintenance downtime. Inherent availability is generally derived from analysis of an engineering design:

  1. The impact of a repairable-element (refurbishing/remanufacture isn't repair, but rather replacement) on the availability of the system, in which it operates, equals mean time between failures MTBF/(MTBF+ mean time to repair MTTR).
  2. The impact of a one-off/non-repairable element (could be refurbished/remanufactured) on the availability of the system, in which it operates, equals the mean time to failure (MTTF)/(MTTF + the mean time to repair MTTR).

It is based on quantities under control of the designer.

Availability, achieved (Aa) [3] The probability that an item will operate satisfactorily at a given point in time when used under stated conditions in an ideal support environment (i.e., that personnel, tools, spares, etc. are instantaneously available). It excludes logistics time and waiting or administrative downtime. It includes active preventive and corrective maintenance downtime.

Availability, operational (Ao) [4] The probability that an item will operate satisfactorily at a given point in time when used in an actual or realistic operating and support environment. It includes logistics time, ready time, and waiting or administrative downtime, and both preventive and corrective maintenance downtime. This value is equal to the mean time between failure (MTBF) divided by the mean time between failure plus the mean downtime (MDT). This measure extends the definition of availability to elements controlled by the logisticians and mission planners such as quantity and proximity of spares, tools and manpower to the hardware item.

Refer to Systems engineering for more details

Basic example

If we are using equipment which has a mean time to failure (MTTF) of 81.5 years and mean time to repair (MTTR) of 1 hour:

MTTF in hours = 81.5 × 365 × 24 = 713940 (This is a reliability parameter and often has a high level of uncertainty!)
Inherent availability (Ai) = 713940 / (713940+1) = 713940 / 713941 = 99.999860%
Inherent unavailability = 1 / 713940 = 0.000140%

Outage due to equipment in hours per year = 1/rate = 1/MTTF = 0.01235 hours per year.

Literature

Availability is well established in the literature of stochastic modeling and optimal maintenance. Barlow and Proschan [1975] define availability of a repairable system as "the probability that the system is operating at a specified time t." Blanchard [1998] gives a qualitative definition of availability as "a measure of the degree of a system which is in the operable and committable state at the start of mission when the mission is called for at an unknown random point in time." This definition comes from the MIL-STD-721. Lie, Hwang, and Tillman [1977] developed a complete survey along with a systematic classification of availability.

Availability measures are classified by either the time interval of interest or the mechanisms for the system downtime. If the time interval of interest is the primary concern, we consider instantaneous, limiting, average, and limiting average availability. The aforementioned definitions are developed in Barlow and Proschan [1975], Lie, Hwang, and Tillman [1977], and Nachlas [1998]. The second primary classification for availability is contingent on the various mechanisms for downtime such as the inherent availability, achieved availability, and operational availability. (Blanchard [1998], Lie, Hwang, and Tillman [1977]). Mi [1998] gives some comparison results of availability considering inherent availability.

Availability considered in maintenance modeling can be found in Barlow and Proschan [1975] for replacement models, Fawzi and Hawkes [1991] for an R-out-of-N system with spares and repairs, Fawzi and Hawkes [1990] for a series system with replacement and repair, Iyer [1992] for imperfect repair models, Murdock [1995] for age replacement preventive maintenance models, Nachlas [1998, 1989] for preventive maintenance models, and Wang and Pham [1996] for imperfect maintenance models. A very comprehensive recent book is by Trivedi and Bobbio [2017].

Applications

Availability factor is used extensively in power plant engineering. For example, the North American Electric Reliability Corporation implemented the Generating Availability Data System in 1982. [5]

See also

Related Research Articles

Unavailability, in mathematical terms, is the probability that an item will not operate correctly at a given time and under specified conditions. It opposes availability.

Mean time between failures (MTBF) is the predicted elapsed time between inherent failures of a mechanical or electronic system during normal system operation. MTBF can be calculated as the arithmetic mean (average) time between failures of a system. The term is used for repairable systems while mean time to failure (MTTF) denotes the expected time to failure for a non-repairable system.

<span class="mw-page-title-main">Weibull distribution</span> Continuous probability distribution

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum one-day rainfalls and the time a user spends on a web page.

Mean time to recovery (MTTR) is the average time that a device will take to recover from any failure. Examples of such devices range from self-resetting fuses, to whole systems which have to be repaired or replaced.

<span class="mw-page-title-main">Service life</span> Period of time where an object can fulfill a function

A product's service life is its period of use in service. Several related terms describe more precisely a product's life, from the point of manufacture, storage, and distribution, and eventual use. Service life has been defined as "a product's total life in use from the point of sale to the point of discard" and distinguished from replacement life, "the period after which the initial purchaser returns to the shop for a replacement". Determining a product's expected service life as part of business policy involves using tools and calculations from maintainability and reliability analysis. Service life represents a commitment made by the item's manufacturer and is usually specified as a median. It is the time that any manufactured item can be expected to be "serviceable" or supported by its manufacturer.

Annualized failure rate (AFR) gives the estimated probability that a device or component will fail during a full year of use. It is a relation between the mean time between failure (MTBF) and the hours that a number of devices are run per year. AFR is estimated from a sample of like components—AFR and MTBF as given by vendors are population statistics that can not predict the behaviour of an individual unit.

Failure rate is the frequency with which an engineered system or component fails, expressed in failures per unit of time. It is usually denoted by the Greek letter λ (lambda) and is often used in reliability engineering.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

Reliability, availability and serviceability (RAS), also known as reliability, availability, and maintainability (RAM), is a computer hardware engineering term involving reliability engineering, high availability, and serviceability design. The phrase was originally used by International Business Machines (IBM) as a term to describe the robustness of their mainframe computers.

Mean time to repair (MTTR) is a basic measure of the maintainability of repairable items. It represents the average time required to repair a failed component or device. Expressed mathematically, it is the total corrective maintenance time for failures divided by the total number of corrective maintenance actions for failures during a given period of time. It generally does not include lead time for parts not readily available or other Administrative or Logistic Downtime (ALDT).

High availability (HA) is a characteristic of a system that aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.

In organizational management, mean down time (MDT) is the average time that a system is non-operational. This includes all downtime associated with repair, corrective and preventive maintenance, self-imposed downtime, and any logistics or administrative delays.

<span class="mw-page-title-main">Probabilistic design</span> Discipline within engineering design

Probabilistic design is a discipline within engineering design. It deals primarily with the consideration and minimization of the effects of random variability upon the performance of an engineering system during the design phase. Typically, these effects studied and optimized are related to quality and reliability. It differs from the classical approach to design by assuming a small probability of failure instead of using the safety factor. Probabilistic design is used in a variety of different applications to assess the likelihood of failure. Disciplines which extensively use probabilistic design principles include product design, quality control, systems engineering, machine design, civil engineering and manufacturing.

A prediction of reliability is an important element in the process of selecting equipment for use by telecommunications service providers and other buyers of electronic equipment, and it is essential during the design stage of engineering systems life cycle. Reliability is a measure of the frequency of equipment failures as a function of time. Reliability has a major impact on maintenance and repair costs and on the continuity of service.

Availability is the probability that a system will work as required when required during the period of a mission. The mission could be the 18-hour span of an aircraft flight. The mission period could also be the 3 to 15-month span of a military deployment. Availability includes non-operational periods associated with reliability, maintenance, and logistics.

Maintenance Philosophy is the mix of strategies that ensure an item works as expected when needed.

Software reliability testing is a field of software-testing that relates to testing a software's ability to function, given environmental conditions, for a particular amount of time. Software reliability testing helps discover many problems in the software design and functionality.

RAMP Simulation Software for Modelling Reliability, Availability and Maintainability (RAM) is a computer software application developed by WS Atkins specifically for the assessment of the reliability, availability, maintainability and productivity characteristics of complex systems that would otherwise prove too difficult, cost too much or take too long to study analytically. The name RAMP is an acronym standing for Reliability, Availability and Maintainability of Process systems.

Mean Time to Dangerous Failure. In a safety system MTTFD is the portion of failure modes that can lead to failures that may result in hazards to personnel, environment or equipment.

In the mathematical theory of probability, a generalized renewal process (GRP) or G-renewal process is a stochastic point process used to model failure/repair behavior of repairable systems in reliability engineering. Poisson point process is a particular case of GRP.

References

  1. Elsayed, E., Reliability Engineering, Addison Wesley, Reading, MA,1996
  2. "Inherent Availability (AI)". Glossary of Defense Acquisition Acronyms and Terms. Department of Defense. Archived from the original on 13 April 2014. Retrieved 10 April 2014.
  3. "Achieved Availability (AI)". Glossary of Defense Acquisition Acronyms and Terms. Department of Defense. Archived from the original on 13 April 2014. Retrieved 10 April 2014.
  4. "Operational Availability (AI)". Glossary of Defense Acquisition Acronyms and Terms. Department of Defense. Archived from the original on 12 March 2013. Retrieved 10 April 2014.
  5. "Mandatory Reporting of Conventional Generation Performance Data" (PDF). Generating Availability Data System. North American Electric Reliability Corporation. July 2011. pp. 7, 17. Archived (PDF) from the original on 2022-10-09. Retrieved 13 March 2014.

Sources