Alarm management

Last updated

Alarm management is the application of human factors and ergonomics along with instrumentation engineering and systems thinking to manage the design of an alarm system to increase its usability. Most often the major usability problem is that there are too many alarms annunciated in a plant upset, commonly referred to as alarm flood (similar to an interrupt storm), since it is so similar to a flood caused by excessive rainfall input with a basically fixed drainage output capacity. However, there can also be other problems with an alarm system such as poorly designed alarms, improperly set alarm points, ineffective annunciation, unclear alarm messages, etc. Poor alarm management is one of the leading causes of unplanned downtime, contributing to over $20B in lost production every year, and of major industrial incidents such as the one in Texas City. Developing good alarm management practices is not a discrete activity, but more of a continuous process (i.e., it is more of a journey than a destination). [1]

Contents

Alarm problem history

From their conception, large chemical, refining, power generation, and other processing plants required the use of a control system to keep the process operating successfully and producing products. Due to the fragility of the components as compared to the process, these control systems often required a control room to protect them from the elements and process conditions. In the early days of control rooms, they used what were referred to as "panel boards" which were loaded with control instruments and indicators. These were tied to sensors located in the process streams and on the outside of process equipment. The sensors relayed their information to the control instruments via analogue signals, such as a 4-20 mA current loop in the form of twisted pair wiring. At first these systems merely yielded information, and a well-trained operator was required to make adjustments either by changing flow rates, or altering energy inputs to keep the process within its designed limits.

Alarms were added to alert the operator to a condition that was about to exceed a design limit, or had already exceeded a design limit. Additionally, Emergency Shut Down (ESD) systems were employed to halt a process that was in danger of exceeding either safety, environmental or monetarily acceptable process limits. Alarm were indicated to the operator by annunciator horns, and lights of different colours. (For instance, green lights meant OK, Yellow meant not OK, and Red meant BAD.) Panel boards were usually laid out in a manner that replicated the process flow in the plant. So instrumentation indicating operating units with the plant was grouped together for recognition sake and ease of problem solution. It was a simple matter to look at the entire panel board, and discern whether any section of the plant was running poorly. This was due to both the design of the instruments and the implementation of the alarms associated with the instruments. Instrumentation companies put a lot of effort into the design and individual layout of the instruments they manufactured. To do this they employed behavioural psychology practices which revealed how much information a human being could collect in a quick glance. More complex plants had more complex panel boards, and therefore often more human operators or controllers.

Thus, in the early days of panel board systems, alarms were regulated by both size and cost. In essence, they were limited by the amount of available board space, and the cost of running wiring, and hooking up an annunciator (horn), indicator (light) and switches to flip to acknowledge, and clear a resolved alarm. It was often the case that if a new alarm was needed, an old one had to be given up.

As technology developed, the control system and control methods were tasked to continue to advance a higher degree of plant automation with each passing year. Highly complex material processing called for highly complex control methodologies. Also, global competition pushed manufacturing operations to increase production while using less energy, and producing less waste. In the days of the panel boards, a special kind of engineer was required to understand a combination of the electronic equipment associated with process measurement and control, the control algorithms necessary to control the process (PID basics), and the actual process that was being used to make the products. Around the mid 80's, we entered the digital revolution. Distributed control systems (DCS) were a boon to the industry. The engineer could now control the process without having to understand the equipment necessary to perform the control functions. Panel boards were no longer required, because all of the information that once came across analogue instruments could be digitised, stuffed into a computer and manipulated to achieve the same control actions once performed with amplifiers and potentiometers.

As a side effect, that also meant that alarms were easy and cheap to configure and deploy. You simply typed in a location, a value to alarm on and set it to active. The unintended result was that soon people alarmed everything. Initial installers set an alarm at 80% and 20% of the operating range of any variable just as a habit. The integration of programmable logic controllers, safety instrumented systems, and packaged equipment controllers has been accompanied by an overwhelming increase in associated alarms. [2] One other unfortunate part of the digital revolution was that what once covered several square yards of panel space, now had to be fit into a 17-inch computer monitor. Multiple pages of information was thus employed to replicate the information on the replaced panel board. Alarms were used to tell an operator to go look at a page he was not viewing. Alarms were used to tell an operator that a tank was filling. Every mistake made in operations usually resulted in a new alarm. With the implementation of the OSHA 1910 regulations, HAZOPS studies usually requested several new alarms. Alarms were everywhere. Incidents began to accrue as a combination of too much data collided with too little useful information.

Alarm management history

Recognizing that alarms were becoming a problem, industrial control system users banded together and formed the Alarm Management Task Force, which was a customer advisory board led by Honeywell in 1990. The AMTF included participants from chemical, petrochemical, and refining operations. They gathered and wrote a document on the issues associated with alarm management. This group quickly realised that alarm problems were simply a subset of a larger problem, and formed the Abnormal Situation Management Consortium (ASM is a registered trademark of Honeywell). The ASM Consortium developed a research proposal and was granted funding from the National Institute of Standards and Technology (NIST) in 1994. The focus of this work was addressing the complex human-system interaction and factors that influence successful performance for process operators. Automation solutions have often been developed without consideration of the human that needs to interact with the solution. In particular, alarms are intended to improve situation awareness for the control room operator, but a poorly configured alarm system does not achieve this goal.

The ASM Consortium has produced documents on best practices in alarm management, as well as operator situation awareness, operator effectiveness, and other operator-oriented issues. These documents were originally for ASM Consortium members only, but the ASMC has recently offered these documents publicly. [3]

The ASM consortium also participated in development of an alarm management guideline published by the Engineering Equipment & Materials Users' Association (EEMUA) in the UK. The ASM Consortium provided data from their member companies, and contributed to the editing of the guideline. The result is EEMUA 191 "Alarm Systems- A Guide to Design, Management and Procurement".

Several institutions and societies are producing standards on alarm management to assist their members in the best practices use of alarms in industrial manufacturing systems. Among them are the ISA (ISA 18.2), API (API 1167) and NAMUR (Namur NA 102). Several companies also offer software packages to assist users in dealing with alarm management issues. Among them are DCS manufacturing companies, and third-party vendors who offer add-on systems.

Concepts

The fundamental purpose of alarm annunciation is to alert the operator to deviations from normal operating conditions, i.e. abnormal operating situations. The ultimate objective is to prevent, or at least minimise, physical and economic loss through operator intervention in response to the condition that was alarmed. For most digital control system users, losses can result from situations that threaten environmental safety, personnel safety, equipment integrity, economy of operation, and product quality control as well as plant throughput. A key factor in operator response effectiveness is the speed and accuracy with which the operator can identify the alarms that require immediate action.

By default, the assignment of alarm trip points and alarm priorities constitute basic alarm management. Each individual alarm is designed to provide an alert when that process indication deviates from normal. The main problem with basic alarm management is that these features are static. The resultant alarm annunciation does not respond to changes in the mode of operation or the operating conditions.

When a major piece of process equipment like a charge pump, compressor, or fired heater shuts down, many alarms become unnecessary. These alarms are no longer independent exceptions from normal operation. They indicate, in that situation, secondary, non-critical effects and no longer provide the operator with important information. Similarly, during start-up or shutdown of a process unit, many alarms are not meaningful. This is often the case because the static alarm conditions conflict with the required operating criteria for start-up and shutdown.

In all cases of major equipment failure, start-ups, and shutdowns, the operator must search alarm annunciation displays and analyse which alarms are significant. This wastes valuable time when the operator needs to make important operating decisions and take swift action. If the resultant flood of alarms becomes too great for the operator to comprehend, then the basic alarm management system has failed as a system that allows the operator to respond quickly and accurately to the alarms that require immediate action. In such cases, the operator has virtually no chance to minimise, let alone prevent, a significant loss.

In short, one needs to extend the objectives of alarm management beyond the basic level. It is not sufficient to utilise multiple priority levels because priority itself is often dynamic. Likewise, alarm disabling based on unit association or suppressing audible annunciation based on priority do not provide dynamic, selective alarm annunciation. The solution must be an alarm management system that can dynamically filter the process alarms based on the current plant operation and conditions so that only the currently significant alarms are annunciated.

The fundamental purpose of dynamic alarm annunciation is to alert the operator to relevant abnormal operating situations. They include situations that have a necessary or possible operator response to ensure:

The ultimate objectives are no different from the previous basic alarm annunciation management objectives. Dynamic alarm annunciation management focuses the operator's attention by eliminating extraneous alarms, providing better recognition of critical problems, and insuring swifter, more accurate operator response. [4]

The need for alarm management

Alarm management is usually necessary in a process manufacturing environment that is controlled by an operator using a supervisory control system, such as a DCS, a SCADA or a programmable logic controller (PLC). Such a system may have hundreds of individual alarms that up until very recently have probably been designed with only limited consideration of other alarms in the system. Since humans can only do one thing at a time and can pay attention to a limited number of things at a time, there needs to be a way to ensure that alarms are presented at a rate that can be assimilated by a human operator, particularly when the plant is upset or in an unusual condition. Alarms also need to be capable of directing the operator's attention to the most important problem that he or she needs to act upon, using a priority to indicate degree of importance or rank, for instance. To ensure a continuous production, a seamless service, a perfect quality at any time of day or night, there must be an organisation which implies several teams of people handling, one after the other, the occurring events.

This is more commonly called the on-call management. The on-call management relies on a team of one or more persons (site manager, maintenance staff) or on external organisation (guards, telesurveillance centre). To avoid having a full-time person to monitor a single process or a level, information and / or events transmission is mandatory. This information transmission will enable the on-call staff to be more mobile, more efficient and will allow it to perform other tasks at the same time.

Some improvement methods

The techniques for achieving rate reduction range from the extremely simple ones of reducing nuisance and low value alarms to redesigning the alarm system in a holistic way that considers the relationships among individual alarms.

Design guide

This step involves documenting the methodology or philosophy of how to design alarms. It can include things such as what to alarm, standards for alarm annunciation and text messages, how the operator will interact with the alarms.

Rationalization and Documentation

This phase is a detailed review of all alarms to document their design purpose, and to ensure that they are selected and set properly and meet the design criteria. Ideally this stage will result in a reduction of alarms, but doesn't always.

Advanced methods

The above steps will often still fail to prevent an alarm flood in an operational upset, so advanced methods such as alarm suppression under certain circumstances are then necessary. As an example, shutting down a pump will always cause a low flow alarm on the pump outlet flow, so the low flow alarm may be suppressed if the pump was shut down since it adds no value for the operator, because he or she already knows it was caused by the pump being shut down. This technique can of course get very complicated and requires considerable care in design. In the above case for instance, it can be argued that the low flow alarm does add value as it confirms to the operator that the pump has indeed stopped. Process boundaries (Boundary Management) must also be taken into account.

Alarm management becomes more and more necessary as the complexity and size of manufacturing systems increases. A lot of the need for alarm management also arises because alarms can be configured on a DCS at nearly zero incremental cost, whereas in the past on physical control panel systems that consisted of individual pneumatic or electronic analogue instruments, each alarm required expenditure and control panel area, so more thought usually went into the need for an alarm. Numerous disasters such as Three Mile Island, Chernobyl accident and the Deepwater Horizon have established a clear need for alarm management.

The seven steps to alarm management

[5]

Step 1: Create and adopt an alarm philosophy

A comprehensive design and guideline document is produced which defines a plant standard employing a best-practise alarm management methodology.

Step 2: Alarm performance benchmarking

Analyze the alarm system to determine its strengths and deficiencies, and effectively map out a practical solution to improve it.

Step 3: “Bad actor” alarm resolution

From experience, it is known that around half of the entire alarm load usually comes from a relatively few alarms. The methods for making them work properly are documented, and can be applied with minimum effort and maximum performance improvement.

Step 4: Alarm documentation and rationalisation (D&R)

A full overhaul of the alarm system to ensure that each alarm complies with the alarm philosophy and the principles of good alarm management.

Step 5: Alarm system audit and enforcement

DCS alarm systems are notoriously easy to change and generally lack proper security. Methods are needed to ensure that the alarm system does not drift from its rationalised state.

Step 6: Real-time alarm management

More advanced alarm management techniques are often needed to ensure that the alarm system properly supports, rather than hinders, the operator in all operating scenarios. These include Alarm Shelving, State-Based Alarming, and Alarm Flood Suppression technologies.

Step 7: Control and maintain alarm system performance

Proper management of change and longer term analysis and KPI monitoring are needed, to ensure that the gains that have been achieved from performing the steps above do not dwindle away over time. Otherwise they will; the principle of “entropy” definitely applies to an alarm system.

See also

Notes

  1. Stauffer, Todd; Sands, Nicholas P.; Dunn, Donald G., ALARM MANAGEMENT AND ISA-18 – A JOURNEY, NOT A DESTINATION, Sellersville, PA: exida
  2. "The Pitfalls of Alarm Design and Benchmark Analysis". www.prosys.com. Archived from the original on 2016-04-15. Retrieved 2016-04-01.
  3. ASM Consortium "Effective Alarm Management Guidelines".
  4. Jensen, Leslie D. "Dynamic Alarm Management on an Ethylene Plant" Archived 2016-03-04 at the Wayback Machine . Retrieved 2008-05-22.
  5. Hollifield, Bill R. & Habibi, Eddie (2010). The Alarm Management Handbook (2 ed.). Houston, TX: PAS, Inc. pp. 35–182. ISBN   978-0-9778969-2-9.

Related Research Articles

Instrumentation is a collective term for measuring instruments, used for indicating, measuring and recording physical quantities. It is also a field of study about the art and science about making measurement instruments, involving the related areas of metrology, automation, and control theory. The term has its origins in the art and science of scientific instrument-making.

Supervisory control and data acquisition (SCADA) is a control system architecture comprising computers, networked data communications and graphical user interfaces for high-level supervision of machines and processes. It also covers sensors and other devices, such as programmable logic controllers, which interface with process plant or machinery.

<span class="mw-page-title-main">Alarm device</span> Type of signal (or device) that alerts people to a dangerous condition

An alarm device is a mechanism that gives an audible, visual or other kind of alarm signal to alert someone to a problem or condition that requires urgent attention.

A distributed control system (DCS) is a computerised control system for a process or plant usually with many control loops, in which autonomous controllers are distributed throughout the system, but there is no central operator supervisory control. This is in contrast to systems that use centralized controllers; either discrete controllers located at a central control room or within a central computer. The DCS concept increases reliability and reduces installation costs by localising control functions near the process plant, with remote monitoring and supervision.

<span class="mw-page-title-main">Remote terminal unit</span> Computer peripheral to collect telemetry data

A remote terminal unit (RTU) is a microprocessor-controlled electronic device that interfaces objects in the physical world to a distributed control system or SCADA system by transmitting telemetry data to a master system, and by using messages from the master supervisory system to control connected objects. Other terms that may be used for RTU are remote telemetry unit and remote telecontrol unit.

An industrial process control or simply process control in continuous production processes is a discipline that uses industrial control systems and control theory to achieve a production level of consistency, economy and safety which could not be achieved purely by human manual control. It is implemented widely in industries such as automotive, mining, dredging, oil refining, pulp and paper manufacturing, chemical processing and power generating plants.

An energy management system (EMS) is a system of computer-aided tools used by operators of electric utility grids to monitor, control, and optimize the performance of the generation or transmission system. Also, it can be used in small scale systems like microgrids. As electric vehicle (EV) charging becomes more popular smaller residential devices that manage when a EV can charge based on the total load vs total capacity of an electrical service are becoming popular.

<span class="mw-page-title-main">Fire alarm control panel</span> Controlling component of a fire alarm system

A fire alarm control panel (FACP), fire alarm control unit (FACU), fire indicator panel (FIP), or simply fire alarm panel is the controlling component of a fire alarm system. The panel receives information from devices designed to detect and report fires, monitors their operational integrity, and provides for automatic control of equipment, and transmission of information necessary to prepare the facility for fire based on a predetermined sequence. The panel may also supply electrical energy to operate any associated initiating device, notification appliance, control, transmitter, or relay. There are four basic types of panels: coded panels, conventional panels, addressable panels, and multiplex systems.

<span class="mw-page-title-main">Chemical plant</span> Industrial process plant that manufactures chemicals

A chemical plant is an industrial process plant that manufactures chemicals, usually on a large scale. The general objective of a chemical plant is to create new material wealth via the chemical or biological transformation and or separation of materials. Chemical plants use specialized equipment, units, and technology in the manufacturing process. Other kinds of plants, such as polymer, pharmaceutical, food, and some beverage production facilities, power plants, oil refineries or other refineries, natural gas processing and biochemical plants, water and wastewater treatment, and pollution control equipment use many technologies that have similarities to chemical plant technology such as fluid systems and chemical reactor systems. Some would consider an oil refinery or a pharmaceutical or polymer manufacturer to be effectively a chemical plant.

Power-system automation is the act of automatically controlling the power system via instrumentation and control devices. Substation automation refers to using data from Intelligent electronic devices (IED), control and automation capabilities within the substation, and control commands from remote users to control power-system devices.

<span class="mw-page-title-main">Oil terminal</span> Industrial facility for the storage of oil, petroleum and petrochemical products

An oil terminal is an industrial facility for the storage of oil, petroleum and petrochemical products, and from which these products are transported to end users or other storage facilities. An oil terminal typically has a variety of above or below ground tankage; facilities for inter-tank transfer; pumping facilities; loading gantries for filling road tankers or barges; ship loading/unloading equipment at marine terminals; and pipeline connections.

IEC standard 61511 is a technical standard which sets out practices in the engineering of systems that ensure the safety of an industrial process through the use of instrumentation. Such systems are referred to as Safety Instrumented Systems. The title of the standard is "Functional safety - Safety instrumented systems for the process industry sector".

In functional safety a safety instrumented system (SIS) is an engineered set of hardware and software controls which provides a protection layer that shuts down a chemical, nuclear, electrical, or mechanical system, or part of it, if a hazardous condition is detected.

An industrial control system (ICS) is an electronic control system and associated instrumentation used for industrial process control. Control systems can range in size from a few modular panel-mounted controllers to large interconnected and interactive distributed control systems (DCSs) with many thousands of field connections. Control systems receive data from remote sensors measuring process variables (PVs), compare the collected data with desired setpoints (SPs), and derive command functions that are used to control a process through the final control elements (FCEs), such as control valves.

An annunciator panel, also known in some aircraft as the Centralized Warning Panel (CWP) or Caution Advisory Panel (CAP), is a group of lights used as a central indicator of status of equipment or systems in an aircraft, industrial process, building or other installation. Usually, the annunciator panel includes a main warning lamp or audible signal to draw the attention of operating personnel to the annunciator panel for abnormal events or condition.

Manufacturing execution systems (MES) are computerized systems used in manufacturing to track and document the transformation of raw materials to finished goods. MES provides information that helps manufacturing decision-makers understand how current conditions on the plant floor can be optimized to improve production output. MES works as real-time monitoring system to enable the control of multiple elements of the production process.

<span class="mw-page-title-main">Instrumentation in petrochemical industries</span>

Instrumentation is used to monitor and control the process plant in the oil, gas and petrochemical industries. Instrumentation ensures that the plant operates within defined parameters to produce materials of consistent quality and within the required specifications. It also ensures that the plant is operated safely and acts to correct out of tolerance operation and to automatically shut down the plant to prevent hazardous conditions from occurring. Instrumentation comprises sensor elements, signal transmitters, controllers, indicators and alarms, actuated valves, logic circuits and operator interfaces.

IEC 62682 is a technical standard titled Management of alarms systems for the process industries.

PLC technicians design, program, repair, and maintain programmable logic controller (PLC) systems used within manufacturing and service industries ranging from industrial packaging to commercial car washes and traffic lights.

<span class="mw-page-title-main">Panel switch</span>

The Panel Machine Switching System is a type of automatic telephone exchange for urban service that was used in the Bell System in the United States for seven decades. The first semi-mechanical types of this design were installed in 1915 in Newark, New Jersey, and the last were retired in the same city in 1983.

References