Event management (ITIL)

Last updated

Event Management, as defined by ITIL, is the process that monitors all events that occur through the IT infrastructure. It allows for normal operation and also detects and escalates exception conditions.

Contents

An event can be defined as any detectable or discernible occurrence that has significance for the management of the IT Infrastructure or the delivery of IT service and evaluation of the impact a deviation might cause to the services. Events are typically notifications created by an IT service, Configuration Item (CI) or monitoring tool.

Purpose/scope

Event handling

Event notification and detection

Event notifications can be proprietary, only certain management tools can be used to detect events. Most of the Configuration Items (CIs) generate event notifications using SNMP open protocol (Simple Network Management Protocol).
The CIs are configured to generate a set of events based on the designer's experience.
Once an Event notification has been generated, it will be detected by the specific tool (read and interpreted)

Event filtering

Filtering means that the event notification can be ignored or communicated to the management tool. If ignored, the event will usually be recorded in a log file on the device, but no further action will be taken.
During the filtering step, the event will receive a level of correlation (type: informational, warning, or exception).
The filtering step is not always mandatory, some CI's have significant events that are communicated directly into the management tool (even if they are duplicated).

Significance of event

Standard categorization based on the significance of an event:

Note the addition below is not an Event type but analysis that can be carried out from the Event logs:

Response

At this point in the process, there are a number of response options available. Some of the options available are:

Incident Record: an incident can be generated when an exception is detected.

Close event

See also

Related Research Articles

In network management, fault management is the set of functions that detect, isolate, and correct malfunctions in a telecommunications network, compensate for environmental changes, and include maintaining and examining error logs, accepting and acting on error detection notifications, tracing and identifying faults, carrying out sequences of diagnostics tests, correcting faults, reporting error conditions, and localizing and tracing faults by examining and manipulating database information.

In telecommunications, provisioning involves the process of preparing and equipping a network to allow it to provide new services to its users. In National Security/Emergency Preparedness telecommunications services, "provisioning" equates to "initiation" and includes altering the state of an existing priority service or capability.

FCAPS is the ISO Telecommunications Management Network model and framework for network management. FCAPS is an acronym for fault, configuration, accounting, performance, security, the management categories into which the ISO model defines network management tasks. In non-billing organizations accounting is sometimes replaced with administration.

In computing, syslog is a standard for message logging. It allows separation of the software that generates messages, the system that stores them, and the software that reports and analyzes them. Each message is labeled with a facility code, indicating the type of system generating the message, and is assigned a severity level.

Security event management (SEM), and the related SIM and SIEM, are computer security disciplines that use data inspection tools to centralize the storage and interpretation of logs or events generated by other software running on a network.

<span class="mw-page-title-main">Event Viewer</span> Component of Microsofts Windows NT operating system

Event Viewer is a component of Microsoft's Windows NT operating system that lets administrators and users view the event logs, typically file extensions .evt and .evtx, on a local or remote machine. Applications and operating-system components can use this centralized log service to report events that have taken place, such as a failure to start a component or to complete an action. In Windows Vista, Microsoft overhauled the event system.

In computer log management and intelligence, log analysis is an art and science seeking to make sense of computer-generated records. The process of creating such records is called data logging.

BlackBerry Enterprise Server designates the middleware software package that is part of the BlackBerry wireless platform supplied by BlackBerry Limited. The software plus service connects to messaging and collaboration software on enterprise networks to redirect emails and synchronize contacts and calendaring information between servers, desktop workstations, as well as mobile devices. Some third-party connectors exist, including Scalix, Zarafa, Zimbra, and the Google Apps BES Connector, although these are not supported by BlackBerry Limited. As of June 2018, BlackBerry Enterprise Server has been renamed to BlackBerry Unified Endpoint Manager (UEM).

<span class="mw-page-title-main">Security and Maintenance</span> Microsoft Windows software

Security and Maintenance is a component of the Windows NT family of operating systems that monitors the security and maintenance status of the computer. Its monitoring criteria includes optimal operation of antivirus software, personal firewall, as well as the working status of Backup and Restore, Network Access Protection (NAP), User Account Control (UAC), Windows Error Reporting (WER), and Windows Update. It notifies the user of any problem with the monitored criteria, such as when an antivirus program is not up-to-date or is offline.

Windows Vista contains a range of new technologies and features that are intended to help network administrators and power users better manage their systems. Notable changes include a complete replacement of both the Windows Setup and the Windows startup processes, completely rewritten deployment mechanisms, new diagnostic and health monitoring tools such as random access memory diagnostic program, support for per-application Remote Desktop sessions, a completely new Task Scheduler, and a range of new Group Policy settings covering many of the features new to Windows Vista. Subsystem for UNIX Applications, which provides a POSIX-compatible environment is also introduced.

Production support covers the practices and disciplines of supporting the IT systems and applications which are currently being used by an organization, the organizations customers, and its end users. A production support analyst or engineer is responsible for monitoring the production environments, servers, scheduled jobs, incident management and receiving incidents and requests from end-users, analyzing these and either responding to the end user with a solution or escalating it to other IT teams. These teams may include developers, system engineers and database administrators.

Avaya Unified Communications Management in Computer Networking is the name of a collection of GUI software programs from Avaya. It uses a service-oriented architecture (SOA) that serves as a foundation forunifying the configuration and monitoring of Avaya Unified Communications Servers and data systems.

In computing, a firewall is a network security system that monitors and controls incoming and outgoing network traffic based on predetermined security rules. A firewall typically establishes a barrier between a trusted network and an untrusted network, such as the Internet.

The Application Interface Specification (AIS) is a collection of open specifications that define the application programming interfaces (APIs) for high-availability application computer software. It is developed and published by the Service Availability Forum and made freely available. Besides reducing the complexity of high-availability applications and shortening development time, the specifications intended to ease the portability of applications between different middleware implementations and to admit third party developers to a field that was highly proprietary in the past.

Security information and event management (SIEM) is a field within computer security that combines security information management (SIM) and security event management (SEM) to enable real-time analysis of security alerts generated by applications and network hardware. SIEM systems are central to the operation of security operations centers (SOCs), where they are employed to detect, investigate, and respond to security incidents. SIEM technology collects and aggregates data from various systems, allowing organizations to meet compliance requirements while safeguarding against threats.

Checkmk is a software system developed in Python and C++ for IT Infrastructure monitoring. It is used for the monitoring of servers, applications, networks, cloud infrastructures, containers, storage, databases and environment sensors.

An emergency communication system (ECS) is any system that is organized for the primary purpose of supporting one-way and two-way communication of emergency information between both individuals and groups of individuals. These systems are commonly designed to convey information over multiple types of devices, from signal lights to text messaging to live, streaming video, forming a unified communication system intended to optimize communications during emergencies. Contrary to emergency notification systems, which generally deliver emergency information in one direction, emergency communication systems are typically capable of both initiating and receiving information between multiple parties. These systems are often made up of both input devices, sensors, and output/communication devices. Therefore, the origination of information can occur from a variety of sources and locations, from which the system will disseminate that information to one or more target audiences.

<span class="mw-page-title-main">Octopussy (software)</span> Log analysis software

Octopussy, also known as 8Pussy, is a free and open-source computer-software which monitors systems, by constantly analyzing the syslog data they generate and transmit to such a central Octopussy server. Therefore, software like Octopussy plays an important role in maintaining an information security management system within ISO/IEC 27001-compliant environments.

Artificial Intelligence for IT Operations (AIOps) is a practice that uses artificial intelligence and machine learning to enhance and automate various aspects of IT operations. It is designed to optimize IT environments by analyzing large volumes of data generated by complex IT systems, including system logs, performance metrics, and network data. AIOps aims to streamline IT workflows, predict potential issues, automate incident response, and ultimately improve the performance and efficiency of enterprise IT environments.

In IT operations, software performance management is the subset of tools and processes in IT Operations which deals with the collection, monitoring, and analysis of performance metrics. These metrics can indicate to IT staff whether a system component is up and running (available), or that the component is behaving in an abnormal way that would impact its ability to function correctly—much like how a doctor may measure pulse, respiration, and temperature to measure how the human body is "operating". This type of monitoring originated with computer network components, but has now expanded into monitoring other components such as servers and storage devices, as well as groups of components organized to deliver specific services and Business Service Management).

References