RAMP Simulation Software for Modelling Reliability, Availability and Maintainability

Last updated
RAMP Simulation Software for Modelling Reliability, Availability and Maintainability
Developer(s) Atkins
Operating system Windows
Type Simulation software
License Proprietary
Website

RAMP Simulation Software for Modelling Reliability, Availability and Maintainability (RAM) is a computer software application developed by WS Atkins specifically for the assessment of the reliability, availability, maintainability and productivity characteristics of complex systems that would otherwise prove too difficult, cost too much or take too long to study analytically. The name RAMP is an acronym standing for Reliability, Availability and Maintainability of Process systems.

Contents

RAMP models reliability using failure probability distributions for system elements, as well as accounting for common mode failures. RAMP models availability using logistic repair delays caused by shortages of spare parts or manpower, and their associated resource conditions defined for system elements. RAMP models maintainability using repair probability distributions for system elements, as well as preventive maintenance data and fixed logistic delays between failure detection and repair commencement.

RAMP consists of two parts:

  1. RAMP Model Builder. A front-end interactive graphical user interface (GUI).
  2. RAMP Model Processor. A back-end discrete-event simulation that employs the Monte Carlo method.

RAMP Model Builder

The RAMP Model Builder enables the user to create a block diagram describing the dependency of the process being modelled on the state of individual elements in the system.

Elements

Elements are the basic building blocks of a system modelled in RAMP and can have user-specified failure and repair characteristics in the form probability distributions, typically of Mean Time Between Failure (MTBF) and Mean Time To Repair (MTTR) values respectively, chosen from the following:

  1. Weibull: Defined by scale and shape parameters (or optionally 50th and 95th percentiles for repairs).
  2. Negative exponential: Defined by mean average.
  3. Lognormal: Defined by median average and dispersion (or optionally 50th and 95th percentiles for repairs).
  4. Fixed (Uniform): Defined by a maximum time to failure or repair.
  5. Empirical (user-defined): Defined by a multiplier.

Elements can represent any part of a system from a specific failure mode of a minor component (e.g. isolation valve fails open) to major subsystems (e.g. compressor or power turbine failure) depending on the level and detail of the analysis required.

Deterministic elements

RAMP allows the user to define deterministic elements which are failure free and/or are unrepairable. These elements may be used to represent parameters of the process (e.g. purity of feedstock or production demand at a particular time) or where necessary in the modelling logic (e.g. to provide conversion factors).

Q values

Each element of the model has a user-defined process 'q value' representing a parameter of interest (e.g. mass flow, generation capacity etc.). Each element is considered to be either operating or not operating and has associated performance values q = Q or q = 0 respectively. The interpretation of each 'q value' in the model depends on the parameter of interest being modelled, which is typically chosen during the system analysis stage of model design.

Groups

Elements with interacting functionality can be organised into groups. Groups can be further combined (to any depth) to produce a Process Dependency Diagram (PDD) of the system, which is similar to a normal reliability block diagram (RBD) commonly used in reliability engineering, but also allows complex logical relationships between groups and elements to permit a more accurate representation of the process being modelled. The PDD should not be confused with a flow diagram since it describes dependency, not flow. For example, an element may appear in more than one position in the PDD if this is required to represent the true dependency of the process on that element. Groups may also be shown in full or may be compressed to allow the screen to show other areas to greater resolution.

Group types

Each group can be one of eleven group types, each with its own rule for combining 'q values' of elements and/or other groups within it to produce a 'q value' output. Groups thus define how the behaviour of each element affects the reliability, availability, maintainability and productivity of the system. The eleven group types are divided into two classes:

Five 'Flow' group types:

  1. Minimum (M): qM = min[q1, q2,...qn]
  2. Active Redundant (A): qA = min[Rating, (q1 + q2 + ... + qn)] unless qA < Cut-off, then qA = 0
  3. Standby Redundant (S): qS = as for Active Redundant, but where the first component is always assumed to be duty equipment.
  4. Time (T): qT = 0 if component with 'q value' q1 is in a "down" state when time through mission t < t0, otherwise qT = q1 + ... + qm if component with 'q value' q1 is in an "up" state when time t ≥ t0 + (m-1) x Time Delay, where m = 1 to n.
  5. Buffer (B): if the buffer is not empty qB = q2 else qB = min[q1,q2], where the buffer empties as output if component with 'q value' q2 is in an "up" state with level at time 0 = Initial Level, otherwise level at time t = level at time (t-1) - (q2 - q1), and the buffer fills as input if component with 'q value' q2 is in a "down" state with level at time 0 = Initial Level, otherwise level at time t = Capacity if level at time (t-1) + q1 > C, otherwise level at time t = level at time (t-1) + (q2 - q1). Buffer input and output may also be limited by buffer constraints.

Six 'Logic' group types:

  1. Product (P): qP = q1 x q2 x ... x qn
  2. Quotient (Q): pQ = q1 / q2
  3. Conditionally Greater Than (G): if q1 > q2 then qG = q1 else qG = 0
  4. Conditionally Less Than (L): if q1 < q2 then qG = q1 else qG = 0
  5. Difference (D): max[q1 - q2, 0]
  6. Equality (E): q1 if q1 lies outside the range PA to PB, q2 if q1 lies inside the range PA to PB

Three group types (Active Redundant, Standby Redundant and Time) are displayed in parallel configurations (vertically down the screen). All others are displayed in series configurations (horizontally across the screen).

Six group types (Buffer, Quotient, Conditionally Greater Than, Conditionally Less Than, Difference and Equality) contain exactly two components with 'q values' q1 and q2. All others contain two or more components with 'q values' q1, q2 to qn.

Element states

An element may be in one of five possible states and its 'q value' is determined by its state:

  1. Undergoing preventive maintenance (q = 0).
  2. Being repaired following failure, including queueing for repair (q = 0).
  3. Failed but undetected, dormant failure (q = 0). (e.g. standby equipment unavailable in the event of failure of duty equipment. Thus a problem may not be apparent until a failure of the duty equipment occurs.)
  4. Up but passive, available but not being used (q = 0). (e.g. standby equipment available in the event of failure of duty equipment.)
  5. Up and active, being used (q = Q > 0). (i.e. operating as intended.)

Occurrence of a state transition for an element is determined largely by the user-defined parameters for that element (i.e. its failure and repair distributions and any preventive maintenance cycles).

Element resource and repair conditions

There is often a time delay between an element failing and the commencement of repair of the element. This may be caused by a lack of spare parts, the unavailability of manpower or the element cannot be repaired due to dependencies on other elements (e.g. a pump cannot be repaired because the isolating valve is defective and cannot be closed). In all of these cases, the element must be queued for repair. RAMP allows the user to define multiple resource conditions per element, all of which must be satisfied to allow a repair to be commenced. Each resource condition is one of five types:

  1. Repair Trade: a specified number of a repair trade must be available.
  2. Spare: a specified number of a spare part must be available.
  3. Group Q Value: a specified group must satisfy a condition regarding its 'q value'.
  4. Buffer Level: a specified buffer must satisfy a condition regarding its level.
  5. Element State: a specified element must satisfy a condition regarding its state.

Repair trades repair condition

Repair trades can be specified for the repair of any element, and they represent manpower in the form of a set of skilled maintenance workers with a particular trade. A repair trade can be used for the duration of an element repair (i.e. logistic delay plus a time value drawn from the element repair distribution). On completion of the repair, the Repair Trade becomes available to repair another element. the number of repairs which can be performed simultaneously for elements requiring a particular repair trade depends on the number of repair trade resources allocated and the number of that repair trade specified as a requirement for the repair.

Spares repair condition

If a spare part is required for an element repair, then the spare part is withdrawn from stock at the instant the repair commences (i.e. as soon as the element leaves the repair queue). The maximum number of spare parts of each type that may be held in stock is user-defined. The stock may either be replenished periodically at a user-defined time interval, or when the stock falls below a user-defined level, in which case RAMP allows a user-defined a time delay that must occur between reordering and the actual replenishment of the stock.

Group Q value repair condition

RAMP allows the user to specify that an element cannot be repaired until the 'q value' of a nominated group satisfies one of six conditions (>, ≥, <, ≤, =, ≠) relative to a user-defined non-negative real number repair constraint. These conditions may be used to model certain rules in a system (e.g. a pump cannot be repaired until a tank is empty).

Buffer level repair condition

Specifying a buffer level constraint means that preventive maintenance of an element can be restricted until the buffer level of a nominated buffer group satisfies one of six conditions (>, ≥, <, ≤, =, ≠) relative to a user-defined non-negative real number repair constraint. These conditions may be used to model certain rules in a system (e.g. it may be a requirement for maintenance of a submersible pump that the tank it is in should be empty before repair work commences).

Element state repair condition

RAMP allows the user to specify that an element cannot be repaired until the state of another nominated element satisfies one of six conditions (>, ≥, <, ≤, =, ≠) relative to a user-defined non-negative real number repair constraint.

Repair policy

Each element has user-defined parameters that can affect how it is repaired:

  1. Logistic repair delay: A time period that must elapse before a repair can start on an element. It is a fixed time that is added to the repair time sampled from the user-defined repair probability distribution for the element. Typically, it represents a combination of the time taken for the repair team to reach the site of failure, time to isolate the failed item, and time taken to obtain the required spare part from store.
  2. Repair 'good-as-new' or 'bad-as-old': Refers to the failure rate of an element rather than its 'q-value'. By default an element is restored to 'good-as-new' following repair, but there is an option to toggle a 'bad-as-old' state that simulates a quick-fix equivalent to restoring the element to the beginning of the wear-out phase of a Weibull bathtub curve, should a Weibull probability distribution with shape greater than one be used for repairs.
  3. Repair priority: Used only if element resource and repair conditions are specified (i.e. it is only used if an element has to queue for repair rather than going directly for repair). The purpose of this field is to help determine the sequence in which elements are drawn from the repair queue as resources become available for element repair. Elements are repaired according to their repair priority, where 1 is highest priority, 2 is next highest, and so on. Elements with the same priority are repaired on a 'first come first served' basis.

In addition, each element in a Standby Redundant group has more parameters that can affect how it is repaired:

  1. Passive failure rate factor: Factor by which the element failure rate is multiplied when operating in the passive state as opposed to the active state. By default this factor will be one and typically between zero and one, indicating a lower passive failure rate than active failure rate.
  2. Probability of switching failure: Percentage probability that the element will fail when switched from the passive state into the active state. If such a switching failure occurs, the element must be repaired in the normal way before it can be used again.
  3. Startup delay: Startup of the element going from a passive state to an active state is delayed by a specified time.

Preventive maintenance

RAMP allows the user to model preventive maintenance for each system element by cycles expressed using the three parameters 'up-time'. 'down-time' and 'down-time' start time. RAMP also has an option to toggle 'intelligent preventive maintenance' on each system element, which attempts to improve system performance by doing preventive maintenance when the element is already in 'down-time' for other reasons.

Common mode failures

Common mode failures (CMFs) that cause a number of elements to fail at the same time (e.g. due to the occurrence of a fire or some other catastrophic event, or the failure of a power supply that provides power to several separately defined elements). RAMP allows the user to define CMFs by stating the set of affected elements and the frequency distribution for occurrences of the CMF. When a CMF occurs, any elements which are affected by that particular CMF are placed in the failed state and must be repaired, being queued for repair if necessary. Any elements failed by a CMF will be repaired according to the repair distribution defined for that element. Elements which are already being repaired, are in the repair queue, or are undergoing preventive maintenance remain unaffected by the occurrence of an associated CMF.

Criticalities

The criticality of an element is a measure of how much the element has affected the 'q value' (i.e. performance) of the group to which it belongs. Elements with a high criticality cause more 'down-time' or unavailability on average and are thus critical to the performance of the group. The criticality of an element may vary according to the level of the group (e.g. a motor failure may have a very high criticality for a group that contains failure modes for one pump, but a very low criticality for a group that contains several redundant pumps).

Time units

RAMP allows the user to set the time unit of interest, according to scale and fidelity considerations. The only requirement is that time units should be used consistently across a model to avoid misleading results. Time units are expressed in the following input data:

  1. Element failure probability distributions.
  2. Element repair probability distributions.
  3. Element logistic delay times (before repair).
  4. Element preventive maintenance 'up-times', 'down-times' and start points.
  5. Common mode failure probability distributions.
  6. Percentile times in empirical probability distributions (for failure or repair).
  7. Delay times in Time groups.
  8. Spare part replenishment intervals or re-order delay times.
  9. Rolling average span and increment.
  10. Histogram 'down-times'.
  11. Simulated time period of interest.

Element types

Elements that are assumed to have the same failure and repair characteristics and share a common pool of spare parts can be assigned the same user-defined element type (i.e. pump, motor, tank etc.). This allows for faster construction of complex systems containing many elements that are similar in function since the entry of element data does not need to be repeated for such elements.

Import functionality

Previously built systems can be imported as subsystems of the system currently displayed. This allows for faster construction of complex systems containing many subsystems since they can be constructed in parallel by multiple users before being imported into a common system.

RAMP Model Processor

The RAMP Model Processor mimics the system operating over the time period of interest - known in RAMP as a mission - by sampling failure and repair times from probability distributions (with probabilities drawn from a pseudo-random number generator) and combining with other data defined in the RAMP Model Builder to determine state transition events for each element in the model. The simulation uses discrete events that are queued in chronological order with each event being processed in turn to determine the states and thus the 'q values' of every element in the model at that discrete point in time. Group combination rules are used to determine the 'q values' at successively higher levels of groups, culminating in 'q values' of the outermost groups that when averaged over the events of the simulation typically provide performance measures of the system, which are output in model results in terms of the chosen parameters of interest.

By running enough missions over the same time period of interest (different possible histories from the same starting point), RAMP can be used to generate statistically significant results that establish the likely distribution of the user-defined parameters of interest and thus objectively assess the system, with the confidence bands on the results dependent on the number of missions simulated. On the other hand, by running a mission length that is long in comparison with the failure frequencies and repair times, and simulating only one mission, RAMP can be used to establish the steady-state performance of the system.

History of RAMP

RAMP was originally developed by Rex Thompson & Partners Ltd. in the mid-1980s as an availability simulation program, primarily used for plant and process modelling. [1] The ownership of RAMP was transferred to T.A. Group [2] upon its founding in January 1990, [3] and then to Fluor Corporation when it acquired T.A. Group in April 1996, [4] before passing to the Advantage Technical Consulting business of parent company Advantage Business Group Ltd., [5] formed in February 2001 by a management buy-out of the consulting and information technology businesses of Fluor Corporation, operating in the transport, defence, energy and manufacturing sectors. [6] RAMP is currently owned by Atkins following its acquisition of Advantage Business Group Ltd. in March 2007. [7] Extensive redevelopment by Atkins of the original RAMP application for DOS has produced a series of RAMP applications for the Microsoft Windows platform, with the RAMP Model Builder written in Visual Basic and the RAMP Model Processor written in FORTRAN.

Uses of RAMP

Due to its inherent flexibility, RAMP is now used to optimise system design and support critical decision making in many sectors [8] RAMP provides the capability to model many factors that may affect a system such as changes in specification or procurement contracts, 'what if' studies, sensitivity analysis, equipment redundancy, equipment criticality, delayed failures, as well as allowing the generation of results that can be exported for failure mode, effects and criticality analysis (FMECA) and cost-benefit analysis.

Related Research Articles

The erlang is a dimensionless unit that is used in telephony as a measure of offered load or carried load on service-providing elements such as telephone circuits or telephone switching equipment. A single cord circuit has the capacity to be used for 60 minutes in one hour. Full utilization of that capacity, 60 minutes of traffic, constitutes 1 erlang.

In reliability engineering, the term availability has the following meanings:

Unavailability, in mathematical terms, is the probability that an item will not operate correctly at a given time and under specified conditions. It opposes availability.

Mean time between failures (MTBF) is the predicted elapsed time between inherent failures of a mechanical or electronic system, during normal system operation. MTBF can be calculated as the arithmetic mean (average) time between failures of a system. The term is used for repairable systems, while mean time to failure (MTTF) denotes the expected time to failure for a non-repairable system.

Maintenance (technical) Operational and functional checks, repair or replacing of a product or technical system or parts thereof in order to keep their necessary technical condition

The technical meaning of maintenance involves functional checks, servicing, repairing or replacing of necessary devices, equipment, machinery, building infrastructure, and supporting utilities in industrial, business, and residential installations. Over time, this has come to include multiple wordings that describe various cost-effective practices to keep equipment operational; these activities occur either before or after a failure.

Failure mode and effects analysis is the process of reviewing as many components, assemblies, and subsystems as possible to identify potential failure modes in a system and their causes and effects. For each component, the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet. There are numerous variations of such worksheets. An FMEA can be a qualitative analysis, but may be put on a quantitative basis when mathematical failure rate models are combined with a statistical failure mode ratio database. It was one of the first highly structured, systematic techniques for failure analysis. It was developed by reliability engineers in the late 1950s to study problems that might arise from malfunctions of military systems. An FMEA is often the first step of a system reliability study.

Failure rate is the frequency with which an engineered system or component fails, expressed in failures per unit of time. It is usually denoted by the Greek letter λ (lambda) and is often used in reliability engineering.

A long-tailed or heavy-tailed probability distribution is one that assigns relatively high probabilities to regions far from the mean or median. A more formal mathematical definition is given below. In the context of teletraffic engineering a number of quantities of interest have been shown to have a long-tailed distribution. For example, if we consider the sizes of files transferred from a web-server, then, to a good degree of accuracy, the distribution is heavy-tailed, that is, there are a large number of small files transferred but, crucially, the number of very large files transferred remains a major component of the volume downloaded.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

Integrated logistic support (ILS) is a technology in the system engineering to lower a product life cycle cost and decrease demand for logistics by the maintenance system optimization to ease the product support. Although originally developed for military purposes, it is also widely used in commercial customer service organisations.

In organizational management, mean down time (MDT) is the average time that a system is non-operational. This includes all downtime associated with repair, corrective and preventive maintenance, self-imposed downtime, and any logistics or administrative delays.

Spurious trip level (STL) is defined as a discrete level for specifying the spurious trip requirements of safety functions to be allocated to safety systems. An STL of 1 means that this safety function has the highest level of spurious trips. The higher the STL level the lower the number of spurious trips caused by the safety system. There is no limit to the number of spurious trip levels.

OptiY is a design environment providing modern optimization strategies and state of the art probabilistic algorithms for uncertainty, reliability, robustness, sensitivity analysis, data-mining and meta-modeling.

A prediction of reliability is an important element in the process of selecting equipment for use by telecommunications service providers and other buyers of electronic equipment, and it is essential during the design stage of engineering systems life cycle. Reliability is a measure of the frequency of equipment failures as a function of time. Reliability has a major impact on maintenance and repair costs and on the continuity of service.

The survival function is a function that gives the probability that a patient, device, or other object of interest will survive past a certain time.

Software reliability testing is a field of software-testing that relates to testing a software's ability to function, given environmental conditions, for a particular amount of time. Software reliability testing helps discover many problems in the software design and functionality.

In queueing theory, a discipline within the mathematical theory of probability, a fluid queue is a mathematical model used to describe the fluid level in a reservoir subject to randomly determined periods of filling and emptying. The term dam theory was used in earlier literature for these models. The model has been used to approximate discrete models, model the spread of wildfires, in ruin theory and to model high speed data networks. The model applies the leaky bucket algorithm to a stochastic source.

Eight dimensions of quality were delineated by David A. Garvin, formerly C. Roland Christensen Professor of Business Administration at Harvard Business School, and may be used at a strategic level to analyze product quality characteristics. Garvin, who died on 30 April 2017, was posthumously honored with the prestigious award for 'Outstanding Contribution to the Case Method' on 4 March 2018.

AltaRica Modeling language

AltaRica is an object-oriented modeling language dedicated to probabilistic risk and safety analyses. It is a representative of the so-called model-based approach in reliability engineering. Since its version 3.0, it is developed by the non-profit AltaRica Association, which develops jointly the associated modeling environment AltaRica Wizard.

In the mathematical theory of probability, a generalized renewal process (GRP) or G-renewal process is a stochastic point process used to model failure/repair behavior of repairable systems in reliability engineering. Poisson point process is a particular case of GRP.

References

  1. "System Modeling Programs".
  2. "Epsilon M&A Deal Report - Advantage Business Group (Former T.A. Group)- Epsilon-Research".
  3. "Ph.D. In Reliability Engineering | Department of Mechanical Engineering".
  4. http://investor.fluor.com/phoenix.zhtml?c=124955&p=irol-newsArticle&ID=14654&highlight= [ dead link ]
  5. "Advantage Business Group Ltd - Company Profile and News".
  6. "Defense & Security Intelligence & Analysis: IHS Jane's | IHS".
  7. "Epsilon M&A Deal Report - Advantage Business Group (Former T.A. Group)- Epsilon-Research".
  8. Reliability, Maintainability and Risk: 7th Edition. Elsevier. David J. Smith BSc PhD CEng FIEE FIQA HonFSaRS MIGasE.