Incident management

Last updated

An incident is an event that could lead to loss of, or disruption to, an organization's operations, services or functions. Incident management (IcM) is a term describing the activities of an organization to identify, analyze, and correct hazards to prevent a future re-occurrence. These incidents within a structured organization are normally dealt with by either an incident response team (IRT), an incident management team (IMT), or Incident Command System (ICS). Without effective incident management, an incident can disrupt business operations, information security, IT systems, employees, customers, or other vital business functions. [1]

Contents

Description

An incident is an event that could lead to the loss of, or disruption to, an organization's operations, services or functions. [2] Incident management (IcM) is a term describing the activities of an organization to identify, analyze, and correct hazards to prevent a future re-occurrence. If not managed, an incident can escalate into an emergency, crisis or disaster. Incident management is therefore the process of limiting the potential disruption caused by such an event, followed by a return to business as usual. Without effective incident management, an incident can disrupt business operations, information security, IT systems, employees, customers, or other vital business functions. [1]

Physical incident management

National Fire Protection Association states that incident management can be described as, '[a]n IMS [incident management system] is "the combination of facilities, equipment, personnel, procedures and communications operating within a common organizational structure, designed to aid in the management of resources during incidents". [3] [4]

Physical incident management is the real-time response that may last for hours, days, or longer. The United Kingdom Cabinet Office has produced the National Recovery Guidance (NRG), which is aimed at local responders as part of the implementation of the Civil Contingencies Act 2004 (CCA). It describes the response as the following: "Response encompasses the actions taken to deal with the immediate effects of an emergency. In many scenarios, it is likely to be relatively short and to last for a matter of hours or days – rapid implementation of arrangements for collaboration, coordination and communication is, therefore, vital. Response encompasses the effort to deal not only with the direct effects of the emergency itself (eg fighting fires, rescuing individuals) but also the indirect effects (eg disruption, media interest)". [5] [6]

International Organization for Standardization (ISO), which is the world's largest developer of international standards also makes a point in the description of its risk management, principles and guidelines document ISO 31000:2009 that, "Using ISO 31000 can help organizations increase the likelihood of achieving objectives, improve the identification of opportunities and threats and effectively allocate and use resources for risk treatment". [7] This again shows the importance of not just good planning but the effective allocation of resources to treat the risk.

Computer security incident management

Today, an important role is played by a Computer Security Incident Response Team (CSIRT), due to the rise of internet crime, and is a common example of an incident faced by companies in developed nations all across the world. For example, if an organization discovers that an intruder has gained unauthorized access to a computer system, the CSIRT would analyze the situation, determine the breadth of the compromise, and take corrective action.

Currently, over half of the world's hacking attempts on Trans National Corporations (TNCs) take place in North America (57%). 23% of attempts take place in Europe. [8] Having a well-rounded Computer Security Incident Response team is integral to providing a secure environment for any organization, and is becoming a critical part of the overall design of many modern networking teams.

Roles

Incidents within a structured organization are normally dealt with by either an incident response team (IRT), or an incident management team (IMT). These are often designated beforehand or during the event and are placed in control of the organization whilst the incident is dealt with, to restore normal functions. The incident commander manages the response to a security incident and leads the members of the incident response team(s) through the process, as defined by the Incident Command System (ICS). [9]

Usually, as part of the wider management process in private organizations, incident management is followed by post-incident analysis where it is determined why the incident happened despite precautions and controls. This analysis is normally overseen by the leaders of the organization, with the view of preventing a repetition of the incident through precautionary measures and often changes in policy. This information is then used as feedback to further develop the security policy and/or its practical implementation. In the United States, the National Incident Management System, developed by the Department of Homeland Security, integrates effective practices in emergency management into a comprehensive national framework. This often results in a higher level of contingency planning, exercise and training, as well as an evaluation of the management of the incident. [10]

Root cause analysis

Human factors

During the root cause analysis, human factors should be assessed. James Reason conducted a study into the understanding of adverse effects of human factors. [11] The study found that major incident investigations, such as Piper Alpha and Kings Cross Underground Fire, made it clear that the causes of the accidents were distributed widely within and outside the organization. There are two types of events: active failure—an action that has immediate effects and has the likelihood to cause an accident—and latent or delayed action—events can take years to have an effect and are usually combined with triggering events that then cause the accident.

Latent failures are created as the result of decisions taken at the higher echelons of an organisation. Their damaging consequences may lie dormant for a long time, only becoming evident when they combine with local triggering factors (e.g., the spring tide, the loading difficulties at Zeebrugge harbour, etc.) to breach the system's defences. Decisions taken in the higher echelons of an organization can trigger the events towards an accident becoming more likely, the planning, scheduling, forecasting, designing, policymaking, etc., can have a slow burning effect. The actual unsafe act that triggers an accident can be traced back through the organization and the subsequent failures can be exposed, showing the accumulation of latent failures within the system as a whole that led to the accident becoming more likely and ultimately happening. Better improvement action can be applied, and reduce the likelihood of the event happening again. [12]

See also

Related Research Articles

<span class="mw-page-title-main">Risk management</span> Identification, evaluation and control of risks

Risk management is the identification, evaluation, and prioritization of risks followed by coordinated and economical application of resources to minimize, monitor, and control the probability or impact of unfortunate events or to maximize the realization of opportunities.

Security management is the identification of an organization's assets i.e. including people, buildings, machines, systems and information assets, followed by the development, documentation, and implementation of policies and procedures for protecting assets.

<span class="mw-page-title-main">Business continuity planning</span> Prevention and recovery from threats that might affect a company

Business continuity may be defined as "the capability of an organization to continue the delivery of products or services at pre-defined acceptable levels following a disruptive incident", and business continuity planning is the process of creating systems of prevention and recovery to deal with potential threats to a company. In addition to prevention, the goal is to enable ongoing operations before and during execution of disaster recovery. Business continuity is the intended outcome of proper execution of both business continuity planning and disaster recovery.

Crisis management is the process by which an organization deals with a disruptive and unexpected event that threatens to harm the organization or its stakeholders. The study of crisis management originated with large-scale industrial and environmental disasters in the 1980s. It is considered to be the most important process in public relations.

Disaster recovery is the process of maintaining or reestablishing vital infrastructure and systems following a natural or human-induced disaster, such as a storm or battle. It employs policies, tools, and procedures. Disaster recovery focuses on information technology (IT) or technology systems supporting critical business functions as opposed to business continuity. This involves keeping all essential aspects of a business functioning despite significant disruptive events; it can therefore be considered a subset of business continuity. Disaster recovery assumes that the primary site is not immediately recoverable and restores data and services to a secondary site.

<span class="mw-page-title-main">Emergency management</span> Dealing with all humanitarian aspects of emergencies

Emergency management or disaster management is a science and a system charged with creating the framework within which communities reduce vulnerability to hazards and cope with disasters. Emergency management, despite its name, does not actually focus on the management of emergencies, which can be understood as minor events with limited impacts and are managed through the day-to-day functions of a community. Instead, emergency management focuses on the management of disasters, which are events that produce more impacts than a community can handle on its own. The management of disasters tends to require some combination of activity from individuals and households, organizations, local, and/or higher levels of government. Although many different terminologies exist globally, the activities of emergency management can be generally categorized into preparedness, response, mitigation, and recovery, although other terms such as disaster risk reduction and prevention are also common. The outcome of emergency management is to prevent disasters and where this is not possible, to reduce their harmful impacts.

<span class="mw-page-title-main">U.S. critical infrastructure protection</span>

In the U.S., critical infrastructure protection (CIP) is a concept that relates to the preparedness and response to serious incidents that involve the critical infrastructure of a region or the nation. The American Presidential directive PDD-63 of May 1998 set up a national program of "Critical Infrastructure Protection". In 2014 the NIST Cybersecurity Framework was published after further presidential directives.

<span class="mw-page-title-main">Risk register</span>

A risk register (PRINCE2) is a document used as a risk management tool and to fulfill regulatory compliance acting as a repository for all risks identified and includes additional information about each risk, e.g., nature of the risk, reference and owner, mitigation measures. It can be displayed as a scatterplot or as a table.

In the fields of computer security and information technology, computer security incident management involves the monitoring and detection of security events on a computer or computer network, and the execution of proper responses to those events. Computer security incident management is a specialized form of incident management, the primary purpose of which is the development of a well understood and predictable response to damaging events and computer intrusions.

ISO/IEC 27005 "Information technology — Security techniques — Information security risk management" is an international standard published by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) providing good practice guidance on managing risks to information. It is a core part of the ISO/IEC 27000-series of standards, commonly known as ISO27k.

ISO/TC 223 Societal security was a technical committee of the International Organization for Standardization formed in 2001 to develop standards in the area of societal security: i.e. protection of society from and response to incidents, emergencies, and disasters caused by intentional and unintentional human acts, natural hazards, and technical failures.

Information technology risk, IT risk, IT-related risk, or cyber risk is any risk relating to information technology. While information has long been appreciated as a valuable and important asset, the rise of the knowledge economy and the Digital Revolution has led to organizations becoming increasingly dependent on information, information processing and especially IT. Various events or incidents that compromise IT in some way can therefore cause adverse impacts on the organization's business processes or mission, ranging from inconsequential to catastrophic in scale.

<span class="mw-page-title-main">Supply chain risk management</span> Preventing failures in logistics

Supply chain risk management (SCRM) is "the implementation of strategies to manage both everyday and exceptional risks along the supply chain based on continuous risk assessment with the objective of reducing vulnerability and ensuring continuity".

ISO 31000 is a family of standards relating to risk management codified by the International Organization for Standardization. ISO 31000:2018 provides principles and generic guidelines on managing risks that could be negative faced by organizations as these could have consequence in terms of economic performance and professional reputation.

<span class="mw-page-title-main">Risk</span> The possibility of something bad happening

In simple terms, risk is the possibility of something bad happening. Risk involves uncertainty about the effects/implications of an activity with respect to something that humans value, often focusing on negative, undesirable consequences. Many different definitions have been proposed. The international standard definition of risk for common understanding in different applications is "effect of uncertainty on objectives".

In computer security, a threat is a potential negative action or event facilitated by a vulnerability that results in an unwanted impact to a computer system or application.

Human factors are the physical or cognitive properties of individuals, or social behavior which is specific to humans, and influence functioning of technological systems as well as human-environment equilibria. The safety of underwater diving operations can be improved by reducing the frequency of human error and the consequences when it does occur. Human error can be defined as an individual's deviation from acceptable or desirable practice which culminates in undesirable or unexpected results.

Dive safety is primarily a function of four factors: the environment, equipment, individual diver performance and dive team performance. The water is a harsh and alien environment which can impose severe physical and psychological stress on a diver. The remaining factors must be controlled and coordinated so the diver can overcome the stresses imposed by the underwater environment and work safely. Diving equipment is crucial because it provides life support to the diver, but the majority of dive accidents are caused by individual diver panic and an associated degradation of the individual diver's performance. - M.A. Blumenberg, 1996

<span class="mw-page-title-main">Disaster preparedness (cultural property)</span> Preserving and protecting cultural artifact collections

Disaster preparedness in museums, galleries, libraries, archives and private collections, involves any actions taken to plan for, prevent, respond or recover from natural disasters and other events that can cause damage or loss to cultural property. 'Disasters' in this context may include large-scale natural events such as earthquakes, flooding or bushfire, as well as human-caused events such as theft and vandalism. Increasingly, anthropogenic climate change is a factor in cultural heritage disaster planning, due to rising sea levels, changes in rainfall patterns, warming average temperatures, and more frequent extreme weather events.

ISO 22300:2021, Security and resilience – Vocabulary, is an international standard developed by ISO/TC 292 Security and resilience. This document defines terms used in security and resilience standards and includes 360 terms and definitions. This edition was published in the beginning of 2021 and replaces the second edition from 2018.

ISO 22322:2022 is an international standard developed by the ISO/TC 292 Security and Resilience committee. It was published by the International Organization for Standardization (ISO) in 2015.

References

  1. 1 2 "What qualifies as an 'incident'?". Business Link. Archived from the original on 2011-06-15. Retrieved 2018-01-04.
  2. "Dictionary of business continuity management terms" (PDF). Business Continuity Institute. Archived from the original (PDF) on 2015-04-30. Retrieved 2015-09-03.
  3. "List of NFPA Codes and Standards". National Fire Protection Association. 2013. Retrieved 10 April 2013.
  4. "Incident Management". Ready.gov. 2012. Archived from the original on 12 April 2013. Retrieved 10 April 2013.
  5. "National Recovery Guidance". GOV.UK. 2007. Retrieved 10 April 2013.
  6. "Civil Contingencies Act 2004". legislation.gov.uk. 2012. Retrieved 10 April 2013.
  7. "ISO 31000 Risk management". International Organization for Standardization. 2009. Retrieved 13 April 2013.
  8. "Hacking Incidents 2009 – Interesting Data". Roger's Security Blog. TechNet Blogs. 12 Mar 2010. Archived from the original on Sep 24, 2012. Retrieved 2012-11-17.
  9. FEMA. "Incident Command System" (PDF). Retrieved 2024-01-30.
  10. "About the Contingency Planning and Incident Management Division". Homeland Security. Archived from the original on April 2, 2012. Retrieved 2012-11-17.
  11. Reason J (June 1995). "Understanding adverse events: human factors". Quality in Health Care. 4 (2): 80–9. doi:10.1136/qshc.4.2.80. PMC   1055294 . PMID   10151618.
  12. O’Callaghan, Katherine Mary, Incident Management: Human Factors and Minimising Mean Time to Restore Archived 2011-09-17 at the Wayback Machine , Ph.D. Thesis, Australian Catholic University, 2010.

Further reading