This article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations .(September 2017) |
Event correlation is a technique for making sense of a large number of events and pinpointing the few events that are really important in that mass of information. This is accomplished by looking for and analyzing relationships between events.
Event correlation has been used in various fields for many years:
Integrated management is traditionally subdivided into various fields:
Event correlation takes place in different components depending on the field of study:
In this article, we focus on event correlation in integrated management and provide links to other fields.
The goal of integrated management is to integrate the management of networks (data, telephone and multimedia), systems (servers, databases and applications) and IT services in a coherent manner. The scope of this discipline notably includes network management, systems management and Service-Level Management.
Event correlation usually takes place inside one or several management platforms. It is implemented by a piece of software known as the event correlator. This component is automatically fed with events originating from managed elements (applications, devices), monitoring tools, the Trouble Ticket System, etc. Each event captures something special (from the event source standpoint) that happened in the domain of interest to the event correlator, which will vary depending upon the type of analysis the correlator is attempting to perform.
The event correlator plays a key role in integrated management, for only within it do events from many disparate sources come together and allow for comparison across sources. For instance, this is where the failure of a service can be ascribed to a specific failure in the underlying IT infrastructure, or where the root cause of a potential security attack can be identified.
Most event correlators can receive events from trouble ticket systems. However, only some of them are able to notify trouble ticket systems when a problem is solved, which partly explains the difficulty for Service Desks to keep updated with the latest news. In theory, the integration of management in organizations requires the communication between the event correlator and the trouble ticket system to work both ways.
An event may convey an alarm or report an incident (which explains why event correlation used to be called alarm correlation), but not necessarily. It may also report that a situation goes back to normal, or simply send some information that it deems relevant (e.g., policy P has been updated on device D). The severity of the event is an indication given by the event source to the event destination of the priority that this event should be given while being processed.
Event correlation can be decomposed into four steps: event filtering, event aggregation, event masking and root cause analysis. A fifth step (action triggering) is often associated with event correlation and therefore briefly mentioned here.
Event filtering consists in discarding events that are deemed to be irrelevant by the event correlator. For instance, a number of bottom-of-the-range devices are difficult to configure and occasionally send events of no interest to the management platform (e.g., printer P needs A4 paper in tray 1). Another example is the filtering of informational or debugging events by an event correlator that is only interested in availability and faults.
Event aggregation is a technique where multiple events that are very similar (but not necessarily identical) are combined into an aggregate that represents the underlying event data. Its main objective is to summarize a collection of input events into a smaller collection that can be processed using various analytics methods. For example, the aggregate may provide statistical summaries of the underlying events and the resources that are affected by those events. Another example is temporal aggregation, when the same problem is reported over and over again by the event source, until the problem is finally solved.
Event de-duplication is a special type of event aggregation that consists in merging exact duplicates of the same event. Such duplicates may be caused by network instability (e.g., the same event is sent twice by the event source because the first instance was not acknowledged sufficiently quickly, but both instances eventually reach the event destination).
Event masking (also known as topological masking in network management) consists of ignoring events pertaining to systems that are downstream of a failed system. For example, servers that are downstream of a crashed router will fail availability polling.
Root cause analysis is the last and most complex step of event correlation. It consists of analyzing dependencies between events, based for instance on a model of the environment and dependency graphs, to detect whether some events can be explained by others. For example, if database D runs on server S and this server gets durably overloaded (CPU used at 100% for a long time), the event “the SLA for database D is no longer fulfilled” can be explained by the event “Server S is durably overloaded”.
At this stage, the event correlator is left with at most a handful of events that need to be acted upon. Strictly speaking, event correlation ends here. However, by language abuse, the event correlators found on the market (e.g., in network management) sometimes also include problem-solving capabilities. For instance, they may trigger corrective actions or further investigations automatically.
The scope of ITIL is larger than that of integrated management. However, event correlation in ITIL is quite similar to event correlation in integrated management.
In the ITIL version 2 framework, event correlation spans three processes: Incident Management, Problem Management and Service Level Management.
In the ITIL version 3 framework, event correlation takes place in the Event Management process. The event correlator is called a correlation engine.
In computer networking, a proxy server is a server application that acts as an intermediary between a client requesting a resource and the server providing that resource.
Unix security refers to the means of securing a Unix or Unix-like operating system. A secure environment is achieved not only by the design concepts of these operating systems, but also through vigilant user and administrative practices.
In science and engineering, root cause analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problems. It is widely used in IT operations, manufacturing, telecommunications, industrial process control, accident analysis, medicine, healthcare industry, etc. Root cause analysis is a form of deductive inference since it requires an understanding of the underlying causal mechanisms of the potential root causes and the problem.
Event processing is a method of tracking and analyzing (processing) streams of information (data) about things that happen (events), and deriving a conclusion from them. Complex event processing, or CEP, consists of a set of concepts and techniques developed in the early 1990s for processing real-time events and extracting information from event streams as they arrive. The goal of complex event processing is to identify meaningful events in real-time situations and respond to them as quickly as possible.
Enterprise software, also known as enterprise application software (EAS), is computer software used to satisfy the needs of an organization rather than individual users. Such organizations include businesses, schools, interest-based user groups, clubs, charities, and governments. Enterprise software is an integral part of a (computer-based) information system; a collection of such software is called an enterprise system. These systems handle a number of operations in an organization to enhance the business and management reporting tasks. The systems must process the information at a relatively high speed and can be deployed across a variety of networks.
Systems management refers to enterprise-wide administration of distributed systems including computer systems. Systems management is strongly influenced by network management initiatives in telecommunications. The application performance management (APM) technologies are now a subset of Systems management. Maximum productivity can be achieved more efficiently through event correlation, system automation and predictive analysis which is now all part of APM.
Security event management (SEM), and the related SIM and SIEM, are computer security disciplines that use data inspection tools to centralize the storage and interpretation of logs or events generated by other software running on a network.
In computer log management and intelligence, log analysis is an art and science seeking to make sense of computer-generated records. The process of creating such records is called data logging.
Security information management (SIM) is an information security industry term for the collection of data such as log files into a central repository for trend analysis.
Log management (LM) comprises an approach to dealing with large volumes of computer-generated log messages.
In the fields of computer security and information technology, computer security incident management involves the monitoring and detection of security events on a computer or computer network, and the execution of proper responses to those events. Computer security incident management is a specialized form of incident management, the primary purpose of which is the development of a well understood and predictable response to damaging events and computer intrusions.
Prelude SIEM is a Security information and event management (SIEM).
TriGeo Network Security is a United States–based provider of security information and event management (SIEM) technology. The company helps midmarket organizations proactively, protects networks and data from internal and external threats, with a SIEM appliance that provides real-time log management and automated network defense - from the perimeter to the endpoint.
Security information and event management (SIEM) is a field within the field of computer security, where software products and services combine security information management (SIM) and security event management (SEM). They provide real-time analysis of security alerts generated by applications and network hardware. Vendors sell SIEM as software, as appliances, or as managed services; these products are also used to log security data and generate reports for compliance purposes. The term and the initialism SIEM was coined by Mark Nicolett and Amrit Williams of Gartner in 2005.
SAP Logon Tickets represent user credentials in SAP systems. When enabled, users can access multiple SAP applications and services through SAP GUI and web browsers without further username and password inputs from the user. SAP Logon Tickets can also be a vehicle for enabling single sign-on across SAP boundaries; in some cases, logon tickets can be used to authenticate into 3rd party applications such as Microsoft-based web applications.
An information security operations center is a facility where enterprise information systems are monitored, assessed, and defended.
ZENworks, a suite of software products developed and maintained by Micro Focus International for computer systems management, aims to manage the entire life cycle of servers, of desktop PCs, of laptops, and of handheld devices such as Android and iOS mobile phones and tablets. As of 2011 Novell planned to include Full Disk Encryption (FDE) functionality within ZENworks. ZENworks supports multiple server platforms and multiple directory services.
Event Management, as defined by ITIL, is the process that monitors all events that occur through the IT infrastructure. It allows for normal operation and also detects and escalates exception conditions.
Threat Intelligence Platform (TIP) is an emerging technology discipline that helps organizations aggregate, correlate, and analyze threat data from multiple sources in real time to support defensive actions. TIPs have evolved to address the growing amount of data generated by a variety of internal and external resources (such as system logs and threat intelligence feeds) and help security teams identify the threats that are relevant to their organization. By importing threat data from multiple sources and formats, correlating that data, and then exporting it into an organization’s existing security systems or ticketing systems, a TIP automates proactive threat management and mitigation. A true TIP differs from typical enterprise security products in that it is a system that can be programmed by outside developers, in particular, users of the platform. TIPs can also use APIs to gather data to generate configuration analysis, Whois information, reverse IP lookup, website content analysis, name servers, and SSL certificates.
NXLog is a multi-platform log collection and centralization tool that offers log processing features, including log enrichment and log forwarding. In concept NXLog is similar to syslog-ng or Rsyslog but it is not limited to UNIX and syslog only. It supports all major operating systems such as Windows, macOS, IBM AIX, etc, being compatible with many SIEM, log analytics suites and many other platforms. NXLog can handle different log sources and formats, so it can be used to implement a centralized, scalable logging system. NXLog Community Edition is proprietary and can be downloaded free of charge with no license costs or limitations.