Event monitoring

Last updated December 21, 2024

In computer science, event monitoring is the process of collecting, analyzing, and signaling event occurrences to subscribers such as operating system processes, active database rules as well as human operators. These event occurrences may stem from arbitrary sources in both software or hardware such as operating systems, database management systems, application software and processors. Event monitoring may use a time series database.

Basic concepts

Event monitoring makes use of a logical bus to transport event occurrences from sources to subscribers, where event sources signal event occurrences to all event subscribers and event subscribers receive event occurrences. An event bus can be distributed over a set of physical nodes such as standalone computer systems. Typical examples of event buses are found in graphical systems such as X Window System, Microsoft Windows as well as development tools such as SDT.

Event collection is the process of collecting event occurrences in a filtered event log for analysis. A filtered event log is logged event occurrences that can be of meaningful use in the future; this implies that event occurrences can be removed from the filtered event log if they are useless in the future. Event log analysis is the process of analyzing the filtered event log to aggregate event occurrences or to decide whether or not an event occurrence should be signalled. Event signalling is the process of signalling event occurrences over the event bus.

Something that is monitored is denoted the monitored object; for example, an application, an operating system, a database, hardware etc. can be monitored objects. A monitored object must be properly conditioned with event sensors to enable event monitoring, that is, an object must be instrumented with event sensors to be a monitored object. Event sensors are sensors that signal event occurrences whenever an event occurs. Whenever something is monitored, the probe effect must be managed.

Monitored objects and the probe effect

As discussed by Gait,^[1] when an object is monitored, its behavior is changed. In particular, in any concurrent system in which processes can run in parallel, this poses a particular problem. The reason is that whenever sensors are introduced in the system, processes may execute in a different order. This can cause a problem if, for example, we are trying to localize a fault, and by monitoring the system we change its behavior in such a way that the fault may not result in a failure; in essence, the fault can be masked by monitoring the system. The probe effect is the difference in behavior between a monitored object and its un-instrumented counterpart.

According to Schütz,^[2] we can avoid, compensate for, or ignore the probe effect. In critical real-time system, in which timeliness (i.e., the ability of a system to meet time constraints such as deadlines) is significant, avoidance is the only option. If we, for example, instrument a system for testing and then remove the instrumentation before delivery, this invalidates the results of most testing based on the complete system. In less critical real-time system (e.g., media-based systems), compensation can be acceptable for, for example, performance testing. In non-concurrent systems, ignorance is acceptable, since the behavior with respect to the order of execution is left unchanged.

Event log analysis

Event log analysis is known as event composition in active databases, chronicle recognition in artificial intelligence and as real-time logic evaluation in real-time systems. Essentially, event log analysis is used for pattern matching, filtering of event occurrences, and aggregation of event occurrences into composite event occurrences. Commonly, dynamic programming strategies from algorithms are employed to save results of previous analyses for future use, since, for example, the same pattern may be match with the same event occurrences in several consecutive analysis processing. In contrast to general rule processing (employed to assert new facts from other facts, cf. inference engine) that is usually based on backtracking techniques, event log analysis algorithms are commonly greedy; for example, when a composite is said to have occurred, this fact is never revoked as may be done in a backtracking based algorithm.

Several mechanisms have been proposed for event log analysis: finite-state automata, Petri nets, procedural (either based on an imperative programming language or an object-oriented programming languages), a modification of Boyer–Moore string-search algorithm, and simple temporal networks.

Related Research Articles

Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form of decisions. "Understanding" in this context signifies the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

Telemetry is the in situ collection of measurements or other data at remote points and their automatic transmission to receiving equipment (telecommunication) for monitoring. The word is derived from the Greek roots tele, 'far off', and metron, 'measure'. Systems that need external instructions and data to operate require the counterpart of telemetry: telecommand.

<span class="mw-page-title-main">Control system</span> System that manages the behavior of other systems

A control system manages, commands, directs, or regulates the behavior of other devices or systems using control loops. It can range from a single home heating controller using a thermostat controlling a domestic boiler to large industrial control systems which are used for controlling processes or machines. The control systems are designed via control engineering process.

Event processing is a method of tracking and analyzing (processing) streams of information (data) about things that happen (events), and deriving a conclusion from them. Complex event processing (CEP) consists of a set of concepts and techniques developed in the early 1990s for processing real-time events and extracting information from event streams as they arrive. The goal of complex event processing is to identify meaningful events in real-time situations and respond to them as quickly as possible.

Prognostics is an engineering discipline focused on predicting the time at which a system or a component will no longer perform its intended function. This lack of performance is most often a failure beyond which the system can no longer be used to meet desired performance. The predicted time then becomes the remaining useful life (RUL), which is an important concept in decision making for contingency mitigation. Prognostics predicts the future performance of a component by assessing the extent of deviation or degradation of a system from its expected normal operating conditions. The science of prognostics is based on the analysis of failure modes, detection of early signs of wear and aging, and fault conditions. An effective prognostics solution is implemented when there is sound knowledge of the failure mechanisms that are likely to cause the degradations leading to eventual failures in the system. It is therefore necessary to have initial information on the possible failures in a product. Such knowledge is important to identify the system parameters that are to be monitored. Potential uses for prognostics is in condition-based maintenance. The discipline that links studies of failure mechanisms to system lifecycle management is often referred to as prognostics and health management (PHM), sometimes also system health management (SHM) or—in transportation applications—vehicle health management (VHM) or engine health management (EHM). Technical approaches to building models in prognostics can be categorized broadly into data-driven approaches, model-based approaches, and hybrid approaches.

Sensor fusion is a process of combining sensor data or data derived from disparate sources so that the resulting information has less uncertainty than would be possible if these sources were used individually. For instance, one could potentially obtain a more accurate location estimate of an indoor object by combining multiple data sources such as video cameras and WiFi localization signals. The term uncertainty reduction in this case can mean more accurate, more complete, or more dependable, or refer to the result of an emerging view, such as stereoscopic vision.

Electric power quality is the degree to which the voltage, frequency, and waveform of a power supply system conform to established specifications. Good power quality can be defined as a steady supply voltage that stays within the prescribed range, steady AC frequency close to the rated value, and smooth voltage curve waveform. In general, it is useful to consider power quality as the compatibility between what comes out of an electric outlet and the load that is plugged into it. The term is used to describe electric power that drives an electrical load and the load's ability to function properly. Without the proper power, an electrical device may malfunction, fail prematurely or not operate at all. There are many ways in which electric power can be of poor quality, and many more causes of such poor quality power.

Structural health monitoring (SHM) involves the observation and analysis of a system over time using periodically sampled response measurements to monitor changes to the material and geometric properties of engineering structures such as bridges and buildings.

In computer log management and intelligence, log analysis is an art and science seeking to make sense of computer-generated records. The process of creating such records is called data logging.

Brake-by-wire technology in the automotive industry is the ability to control brakes through electronic means, without a mechanical connection that transfers force to the physical braking system from a driver input apparatus such as a pedal or lever.

Computer audition (CA) or machine listening is the general field of study of algorithms and systems for audio interpretation by machines. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems — "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents."

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

Automatic target recognition (ATR) is the ability for an algorithm or device to recognize targets or other objects based on data obtained from sensors.

A flame detector is a sensor designed to detect and respond to the presence of a flame or fire, allowing flame detection. Responses to a detected flame depend on the installation, but can include sounding an alarm, deactivating a fuel line, and activating a fire suppression system. When used in applications such as industrial furnaces, their role is to provide confirmation that the furnace is working properly; it can be used to turn off the ignition system though in many cases theyf take no direct action beyond notifying the operator or control system. A flame detector can often respond faster and more accurately than a smoke or heat detector due to the mechanisms it uses to detect the flame.

Activity recognition aims to recognize the actions and goals of one or more agents from a series of observations on the agents' actions and the environmental conditions. Since the 1980s, this research field has captured the attention of several computer science communities due to its strength in providing personalized support for many different applications and its connection to many different fields of study such as medicine, human-computer interaction, or sociology.

Fault detection, isolation, and recovery (FDIR) is a subfield of control engineering which concerns itself with monitoring a system, identifying when a fault has occurred, and pinpointing the type of fault and its location. Two approaches can be distinguished: A direct pattern recognition of sensor readings that indicate a fault and an analysis of the discrepancy between the sensor readings and expected values, derived from some model. In the latter case, it is typical that a fault is said to be detected if the discrepancy or residual goes above a certain threshold. It is then the task of fault isolation to categorize the type of fault and its location in the machinery. Fault detection and isolation (FDI) techniques can be broadly classified into two categories. These include model-based FDI and signal processing based FDI.

Event-driven SOA is a form of service-oriented architecture (SOA), combining the intelligence and proactiveness of event-driven architecture with the organizational capabilities found in service offerings. Before event-driven SOA, the typical SOA platform orchestrated services centrally, through pre-defined business processes, assuming that what should have already been triggered is defined in a business process. This older approach does not account for events that occur across, or outside of, specific business processes. Thus complex events, in which a pattern of activities—both non-scheduled and scheduled—should trigger a set of services is not accounted for in traditional SOA 1.0 architecture.

Database activity monitoring is a database security technology for monitoring and analyzing database activity. DAM may combine data from network-based monitoring and native audit information to provide a comprehensive picture of database activity. The data gathered by DAM is used to analyze and report on database activity, support breach investigations, and alert on anomalies. DAM is typically performed continuously and in real-time.

High performance computing applications run on massively parallel supercomputers consist of concurrent programs designed using multi-threaded, multi-process models. The applications may consist of various constructs with varying degree of parallelism. Although high performance concurrent programs use similar design patterns, models and principles as that of sequential programs, unlike sequential programs, they typically demonstrate non-deterministic behavior. The probability of bugs increases with the number of interactions between the various parallel constructs. Race conditions, data races, deadlocks, missed signals and live lock are common error types.

This is a list of the individual topics in Electronics, Mathematics, and Integrated Circuits that together make up the Computer Engineering field. The organization is by topic to create an effective Study Guide for this field. The contents match the full body of topics and detail information expected of a person identifying themselves as a Computer Engineering expert as laid out by the National Council of Examiners for Engineering and Surveying. It is a comprehensive list and superset of the computer engineering topics generally dealt with at any one time.

References

↑ J. Gait (1985). A debugger for concurrent programs. Software-Practice And Experience, 15(6)
↑ W. Schütz (1994). Fundamental issues in testing distributed real-time systems. Real-Time Systems, 7(2):129–157

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] J. Gait (1985). A debugger for concurrent programs. Software-Practice And Experience, 15(6)

[2] W. Schütz (1994). Fundamental issues in testing distributed real-time systems. Real-Time Systems, 7(2):129–157

[1]

[2]