Business administration |
---|
Management of a business |
An incident is an event that could lead to loss of, or disruption to, an organization's operations, services or functions. Incident management (IcM) is a term describing the activities of an organization to identify, analyze, and correct hazards to prevent a future re-occurrence. These incidents within a structured organization are normally dealt with by either an incident response team (IRT), an incident management team (IMT), or Incident Command System (ICS). Without effective incident management, an incident can disrupt business operations, information security, IT systems, employees, customers, or other vital business functions. [1]
An incident is an event that could lead to the loss of, or disruption to, an organization's operations, services or functions. [2] Incident management (IcM) is a term describing the activities of an organization to identify, analyze, and correct hazards to prevent a future re-occurrence. If not managed, an incident can escalate into an emergency, crisis or disaster. Incident management is therefore the process of limiting the potential disruption caused by such an event, followed by a return to business as usual. Without effective incident management, an incident can disrupt business operations, information security, IT systems, employees, customers, or other vital business functions. [1]
National Fire Protection Association states that incident management can be described as, '[a]n IMS [incident management system] is "the combination of facilities, equipment, personnel, procedures and communications operating within a common organizational structure, designed to aid in the management of resources during incidents". [3] [4]
Physical incident management is the real-time response that may last for hours, days, or longer. The United Kingdom Cabinet Office has produced the National Recovery Guidance (NRG), which is aimed at local responders as part of the implementation of the Civil Contingencies Act 2004 (CCA). It describes the response as the following: "Response encompasses the actions taken to deal with the immediate effects of an emergency. In many scenarios, it is likely to be relatively short and to last for a matter of hours or days – rapid implementation of arrangements for collaboration, coordination and communication is, therefore, vital. Response encompasses the effort to deal not only with the direct effects of the emergency itself (eg fighting fires, rescuing individuals) but also the indirect effects (eg disruption, media interest)". [5] [6]
International Organization for Standardization (ISO), which is the world's largest developer of international standards also makes a point in the description of its risk management, principles and guidelines document ISO 31000:2009 that, "Using ISO 31000 can help organizations increase the likelihood of achieving objectives, improve the identification of opportunities and threats and effectively allocate and use resources for risk treatment". [7] This again shows the importance of not just good planning but the effective allocation of resources to treat the risk.
Today, an important role is played by a Computer Security Incident Response Team (CSIRT), due to the rise of internet crime, and is a common example of an incident faced by companies in developed nations all across the world. For example, if an organization discovers that an intruder has gained unauthorized access to a computer system, the CSIRT would analyze the situation, determine the breadth of the compromise, and take corrective action.
Currently, over half of the world's hacking attempts on Trans National Corporations (TNCs) take place in North America (57%). 23% of attempts take place in Europe. [8] Having a well-rounded Computer Security Incident Response team is integral to providing a secure environment for any organization, and is becoming a critical part of the overall design of many modern networking teams.
Incidents within a structured organization are normally dealt with by either an incident response team (IRT), or an incident management team (IMT). These are often designated beforehand or during the event and are placed in control of the organization whilst the incident is dealt with, to restore normal functions. The incident commander manages the response to a security incident and leads the members of the incident response team(s) through the process, as defined by the Incident Command System (ICS). [9]
Usually, as part of the wider management process in private organizations, incident management is followed by post-incident analysis where it is determined why the incident happened despite precautions and controls. This analysis is normally overseen by the leaders of the organization, with the view of preventing a repetition of the incident through precautionary measures and often changes in policy. This information is then used as feedback to further develop the security policy and/or its practical implementation. In the United States, the National Incident Management System, developed by the Department of Homeland Security, integrates effective practices in emergency management into a comprehensive national framework. This often results in a higher level of contingency planning, exercise and training, as well as an evaluation of the management of the incident. [10]
This section relies largely or entirely on a single source .(January 2018) |
During the root cause analysis, human factors should be assessed. James Reason conducted a study into the understanding of adverse effects of human factors. [11] The study found that major incident investigations, such as Piper Alpha and Kings Cross Underground Fire, made it clear that the causes of the accidents were distributed widely within and outside the organization. There are two types of events: active failure—an action that has immediate effects and has the likelihood to cause an accident—and latent or delayed action—events can take years to have an effect and are usually combined with triggering events that then cause the accident.
Latent failures are created as the result of decisions taken at the higher echelons of an organisation. Their damaging consequences may lie dormant for a long time, only becoming evident when they combine with local triggering factors (e.g., the spring tide, the loading difficulties at Zeebrugge harbour, etc.) to breach the system's defences. Decisions taken in the higher echelons of an organization can trigger the events towards an accident becoming more likely, the planning, scheduling, forecasting, designing, policymaking, etc., can have a slow burning effect. The actual unsafe act that triggers an accident can be traced back through the organization and the subsequent failures can be exposed, showing the accumulation of latent failures within the system as a whole that led to the accident becoming more likely and ultimately happening. Better improvement action can be applied, and reduce the likelihood of the event happening again. [12]
Incident management is an important part of IT service management (ITSM) process area. [13] The first goal of the incident management process is to restore a normal service operation as quickly as possible and to minimize the impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained. 'Normal service operation' is defined here as service operation within service-level agreement (SLA). It is one process area within the broader ITIL and ISO 20000 environment.
ISO 20000 defines the objective of Incident management (part 1, 8.2) as: To restore agreed service to the business as soon as possible or to respond to service requests. [14]
ITIL 2011 defines an incident as:
The ITIL incident management process ensures that normal service operation is restored as quickly as possible and the business impact is minimized.ITIL Service Operation. AXELOS. 30 May 2007. ISBN 978-0113310463.
The main challenges and cause for problems in the Incident management are:
Risk management is the identification, evaluation, and prioritization of risks, followed by the minimization, monitoring, and control of the impact or probability of those risks occurring.
Security management is the identification of an organization's assets i.e. including people, buildings, machines, systems and information assets, followed by the development, documentation, and implementation of policies and procedures for protecting assets.
Business continuity may be defined as "the capability of an organization to continue the delivery of products or services at pre-defined acceptable levels following a disruptive incident", and business continuity planning is the process of creating systems of prevention and recovery to deal with potential threats to a company. In addition to prevention, the goal is to enable ongoing operations before and during execution of disaster recovery. Business continuity is the intended outcome of proper execution of both business continuity planning and disaster recovery.
In science and engineering, root cause analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problems. It is widely used in IT operations, manufacturing, telecommunications, industrial process control, accident analysis (e.g., in aviation, rail transport, or nuclear plants), medical diagnosis, the healthcare industry (e.g., for epidemiology), etc. Root cause analysis is a form of inductive inference (first create a theory, or root, based on empirical evidence, or causes) and deductive inference (test the theory, i.e., the underlying causal mechanisms, with empirical data).
An organizational crisis is described as a rare, high-impact event that jeopardizes the organization's survival. It is marked by uncertainty regarding the cause, effects, and solutions, along with the need for rapid decision-making.
IT disaster recovery (also, simply disaster recovery (DR)) is the process of maintaining or reestablishing vital infrastructure and systems following a natural or human-induced disaster, such as a storm or battle. DR employs policies, tools, and procedures with a focus on IT systems supporting critical business functions. This involves keeping all essential aspects of a business functioning despite significant disruptive events; it can therefore be considered a subset of business continuity (BC). DR assumes that the primary site is not immediately recoverable and restores data and services to a secondary site.
Information technology service management (ITSM) are the activities performed by an organization to design, build, deliver, operate and control IT services offered to customers.
Given organizations' increasing dependency on information technology (IT) to run their operations, business continuity planning covers the entire organization, while disaster recovery focuses on IT.
In the U.S., critical infrastructure protection (CIP) is a concept that relates to the preparedness and response to serious incidents that involve the critical infrastructure of a region or the nation. The American Presidential directive PDD-63 of May 1998 set up a national program of "Critical Infrastructure Protection". In 2014 the NIST Cybersecurity Framework was published after further presidential directives.
ITIL security management describes the structured fitting of security into an organization. ITIL security management is based on the ISO 27001 standard. "ISO/IEC 27001:2005 covers all types of organizations. ISO/IEC 27001:2005 specifies the requirements for establishing, implementing, operating, monitoring, reviewing, maintaining and improving a documented Information Security Management System within the context of the organization's overall business risks. It specifies requirements for the implementation of security controls customized to the needs of individual organizations or parts thereof. ISO/IEC 27001:2005 is designed to ensure the selection of adequate and proportionate security controls that protect information assets and give confidence to interested parties."
In the fields of computer security and information technology, computer security incident management involves the monitoring and detection of security events on a computer or computer network, and the execution of proper responses to those events. Computer security incident management is a specialized form of incident management, the primary purpose of which is the development of a well understood and predictable response to damaging events and computer intrusions.
ISO/IEC 27005 "Information technology — Security techniques — Information security risk management" is an international standard published by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) providing good practice guidance on managing risks to information. It is a core part of the ISO/IEC 27000-series of standards, commonly known as ISO27k.
ISO/TC 223 Societal security was a technical committee of the International Organization for Standardization formed in 2001 to develop standards in the area of societal security: i.e. protection of society from and response to incidents, emergencies, and disasters caused by intentional and unintentional human acts, natural hazards, and technical failures.
Information technology risk, IT risk, IT-related risk, or cyber risk is any risk relating to information technology. While information has long been appreciated as a valuable and important asset, the rise of the knowledge economy and the Digital Revolution has led to organizations becoming increasingly dependent on information, information processing and especially IT. Various events or incidents that compromise IT in some way can therefore cause adverse impacts on the organization's business processes or mission, ranging from inconsequential to catastrophic in scale.
Security level management (SLM) comprises a quality assurance system for information system security.
Supply chain risk management (SCRM) is "the implementation of strategies to manage both everyday and exceptional risks along the supply chain based on continuous risk assessment with the objective of reducing vulnerability and ensuring continuity".
In computer security, a threat is a potential negative action or event enabled by a vulnerability that results in an unwanted impact to a computer system or application.
Human factors are the physical or cognitive properties of individuals, or social behavior which is specific to humans, and which influence functioning of technological systems as well as human-environment equilibria. The safety of underwater diving operations can be improved by reducing the frequency of human error and the consequences when it does occur. Human error can be defined as an individual's deviation from acceptable or desirable practice which culminates in undesirable or unexpected results. Human factors include both the non-technical skills that enhance safety and the non-technical factors that contribute to undesirable incidents that put the diver at risk.
[Safety is] An active, adaptive process which involves making sense of the task in the context of the environment to successfully achieve explicit and implied goals, with the expectation that no harm or damage will occur. – G. Lock, 2022
Dive safety is primarily a function of four factors: the environment, equipment, individual diver performance and dive team performance. The water is a harsh and alien environment which can impose severe physical and psychological stress on a diver. The remaining factors must be controlled and coordinated so the diver can overcome the stresses imposed by the underwater environment and work safely. Diving equipment is crucial because it provides life support to the diver, but the majority of dive accidents are caused by individual diver panic and an associated degradation of the individual diver's performance. – M.A. Blumenberg, 1996
Disaster preparedness in museums, galleries, libraries, archives and private collections, involves any actions taken to plan for, prevent, respond or recover from natural disasters and other events that can cause damage or loss to cultural property. 'Disasters' in this context may include large-scale natural events such as earthquakes, flooding or bushfire, as well as human-caused events such as theft and vandalism. Increasingly, anthropogenic climate change is a factor in cultural heritage disaster planning, due to rising sea levels, changes in rainfall patterns, warming average temperatures, and more frequent extreme weather events.
ISO 22300:2021, Security and resilience – Vocabulary, is an international standard developed by ISO/TC 292 Security and resilience. This document defines terms used in security and resilience standards and includes 360 terms and definitions. This edition was published in the beginning of 2021 and replaces the second edition from 2018.