Streaming conformance checking

Last updated

Streaming conformance checking is a type of doing conformance checking where the deviation (if exists) is reported directly when it happens. Instead of event log, streaming conformance checking techniques take event stream and process model as input and for each received event from the stream, it will be compared with the model.

Contents

Differences with conformance checking

The conventional conformance checking techniques use event log as input. An event log is a static data source recording the business activities over a time span. After the event log is completely recorded, conformance checking techniques are applied and the deviations, if they exist will be shown. However, this kind of doing conformance has several drawbacks:

On the other hand, streaming conformance checking techniques use event stream as input. An event stream is a continuous stream of events executed in the context of an underlying business process. [1] Each event from the event stream is denoted as (c, a) where c is the case identifier and a is the activity name of this event.

With this kind of data, the conformance checking can be continuously performed along the stream, i.e., for each executed activity, the analysis will be directly calculated if that activity causes any deviation based on a given process model. Hence, this kind of conformance checking provides a continuous way of monitoring a process and detecting the deviations in real-time.

Algorithms

The fundamental difference between online and offline conformance checking is the completeness of the input. The behavior seen for each event in the event log is complete, i.e., we would know if the according case is still running or already stops. It is not the case with event stream. At the moment, in which one activity from a case is successfully executed, we would not know if the case ever stops or is already complete, i.e., in the future, no new event will belong to this case. Due to this difference, the conventional conformance checking algorithms are not (fully) applicable in the online context and needed to be adjusted.

Footprints

Input: An event stream and a footprint matrix of the according process model.

Algorithm:

For each received event (c, a)

Token-based replay

Input: An event stream and a petri net model.

Algorithm:

For each received event (c, a)

Temporal profile

Temporal profile measures the average time between two activities and the standard deviation between events having these activities.

Input: An event stream and a temporal profile model.

Algorithm:

For each received event (c, a)

Log skeleton

Log skeleton consists of constraints which describe the relationship between activities in a process. [3]

Input: An event stream and a log skeleton model.

Algorithm:

For each received event (c, a)

Related Research Articles

Software testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation. Test techniques include, but are not necessarily limited to:

<span class="mw-page-title-main">Petri net</span> One of several mathematical modeling systems for the description of distributed systems

A Petri net, also known as a place/transition (PT) net, is one of several mathematical modeling languages for the description of distributed systems. It is a class of discrete event dynamic system. A Petri net is a directed bipartite graph that has two types of elements: places and transitions. Place elements are depicted as white circles and transition elements are depicted as rectangles. A place can contain any number of tokens, depicted as black circles. A transition is enabled if all places connected to it as inputs contain at least one token. Some sources state that Petri nets were invented in August 1939 by Carl Adam Petri—at the age of 13—for the purpose of describing chemical processes.

<span class="mw-page-title-main">Concurrency (computer science)</span> Ability to execute a task in a non-serial manner

In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability of a program, algorithm, or problem into order-independent or partially-ordered components or units of computation.

<span class="mw-page-title-main">Recurrent neural network</span> Computational model used in machine learning

A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. Recurrent neural networks are theoretically Turing complete and can run arbitrary programs to process arbitrary sequences of inputs.

<span class="mw-page-title-main">Fuzzing</span> Automated software testing technique

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

In computer science, stream processing is a programming paradigm which views data streams, or sequences of events in time, as the central input and output objects of computation. Stream processing encompasses dataflow programming, reactive programming, and distributed data processing. Stream processing systems aim to expose parallel processing for data streams and rely on streaming algorithms for efficient implementation. The software stack for these systems includes components such as programming models and query languages, for expressing computation; stream management systems, for distribution and scheduling; and hardware components for acceleration including floating-point units, graphics processing units, and field-programmable gate arrays.

Runtime verification is a computing system analysis and execution approach based on extracting information from a running system and using it to detect and possibly react to observed behaviors satisfying or violating certain properties. Some very particular properties, such as datarace and deadlock freedom, are typically desired to be satisfied by all systems and may be best implemented algorithmically. Other properties can be more conveniently captured as formal specifications. Runtime verification specifications are typically expressed in trace predicate formalisms, such as finite state machines, regular expressions, context-free patterns, linear temporal logics, etc., or extensions of these. This allows for a less ad-hoc approach than normal testing. However, any mechanism for monitoring an executing system is considered runtime verification, including verifying against test oracles and reference implementations. When formal requirements specifications are provided, monitors are synthesized from them and infused within the system by means of instrumentation. Runtime verification can be used for many purposes, such as security or safety policy monitoring, debugging, testing, verification, validation, profiling, fault protection, behavior modification, etc. Runtime verification avoids the complexity of traditional formal verification techniques, such as model checking and theorem proving, by analyzing only one or a few execution traces and by working directly with the actual system, thus scaling up relatively well and giving more confidence in the results of the analysis, at the expense of less coverage. Moreover, through its reflective capabilities runtime verification can be made an integral part of the target system, monitoring and guiding its execution during deployment.

Process mining is a family of techniques relating the fields of data science and process management to support the analysis of operational processes based on event logs. The goal of process mining is to turn event data into insights and actions. Process mining is an integral part of data science, fueled by the availability of event data and the desire to improve processes. Process mining techniques use event data to show what people, machines, and organizations are really doing. Process mining provides novel insights that can be used to identify the executional path taken by operational processes and address their performance and compliance problems.

<span class="mw-page-title-main">Straight skeleton</span> Method in geometry for representing a polygon by a topological skeleton

In geometry, a straight skeleton is a method of representing a polygon by a topological skeleton. It is similar in some ways to the medial axis but differs in that the skeleton is composed of straight line segments, while the medial axis of a polygon may involve parabolic curves. However, both are homotopy-equivalent to the underlying polygon.

Hierarchical temporal memory (HTM) is a biologically constrained machine intelligence technology developed by Numenta. Originally described in the 2004 book On Intelligence by Jeff Hawkins with Sandra Blakeslee, HTM is primarily used today for anomaly detection in streaming data. The technology is based on neuroscience and the physiology and interaction of pyramidal neurons in the neocortex of the mammalian brain.

Business process discovery (BPD) related to business process management and process mining is a set of techniques that manually or automatically construct a representation of an organisations' current business processes and their major process variations. These techniques use data recorded in the existing organisational methods of work, documentations, and technology systems that run business processes within an organisation. The type of data required for process discovery is called an event log. Any record of data that contains the case id, activity name, and timestamp. Such a record qualifies for an event log and can be used to discover the underlying process model. The event log can contain additional information related to the process, such as the resources executing the activity, the type or nature of the events, or any other relevant details. Process discovery aims to obtain a process model that describes the event log as closely as possible. The process model acts as a graphical representation of the process. The event logs used for discovery could contain noise, irregular information, and inconsistent/incorrect timestamps. Process discovery is challenging due to such noisy event logs and because the event log contains only a part of the actual process hidden behind the system. The discovery algorithms should solely depend on a small percentage of data provided by the event logs to develop the closest possible model to the actual behaviour.

<span class="mw-page-title-main">Construction and Analysis of Distributed Processes</span>

CADP is a toolbox for the design of communication protocols and distributed systems. CADP is developed by the CONVECS team at INRIA Rhone-Alpes and connected to various complementary tools. CADP is maintained, regularly improved, and used in many industrial projects.

In computing, algorithmic skeletons, or parallelism patterns, are a high-level parallel programming model for parallel and distributed computing.

The α-algorithm or α-miner is an algorithm used in process mining, aimed at reconstructing causality from a set of sequences of events. It was first put forward by van der Aalst, Weijters and Măruşter. The goal of Alpha miner is to convert the event log into a workflow-net based on the relations between various activities in the event log. An event log is a multi-set of traces, and a trace is a sequence of activity names. Several extensions or modifications of it have since been presented, which will be listed below.

<span class="mw-page-title-main">Device driver synthesis and verification</span>

Device drivers are programs which allow software or higher-level computer programs to interact with a hardware device. These software components act as a link between the devices and the operating systems, communicating with each of these systems and executing commands. They provide an abstraction layer for the software above and also mediate the communication between the operating system kernel and the devices below.

<span class="mw-page-title-main">Conformance checking</span>

Business process conformance checking is a family of process mining techniques to compare a process model with an event log of the same process. It is used to check if the actual execution of a business process, as recorded in the event log, conforms to the model and vice versa.

Robust Principal Component Analysis (RPCA) is a modification of the widely used statistical procedure of principal component analysis (PCA) which works well with respect to grossly corrupted observations. A number of different approaches exist for Robust PCA, including an idealized version of Robust PCA, which aims to recover a low-rank matrix L0 from highly corrupted measurements M = L0 +S0. This decomposition in low-rank and sparse matrices can be achieved by techniques such as Principal Component Pursuit method (PCP), Stable PCP, Quantized PCP, Block based PCP, and Local PCP. Then, optimization methods are used such as the Augmented Lagrange Multiplier Method (ALM), Alternating Direction Method (ADM), Fast Alternating Minimization (FAM), Iteratively Reweighted Least Squares (IRLS ) or alternating projections (AP).

Inductive miner belongs to a class of algorithms used in process discovery. Various algorithms proposed previously give process models of slightly different type from the same input. The quality of the output model depends on the soundness of the model. A number of techniques such as alpha miner, genetic miner, work on the basis of converting an event log into a workflow model, however, they do not produce models that are sound all the time. Inductive miner relies on building a directly follows graph from event log and using this graph to detect various process relations.

Process mining is a technique used to turn event data into insights and actions. Techniques used in process mining such as Process discovery and Conformance checking depend only one the order of activities executed in the operations. The event log not only contains the activity details, but also timestamps, resources and data accompanied with process execution. Careful analysis of the external details from the event log can reveal useful information that can be used for making predictions on decisions that might be taken in the future, efficiency and working dynamics of the team, and performance analysis.

References

  1. van Zelst, Sebastiaan J.; van Dongen, Boudewijn F.; van der Aalst, Wil M. P. (2017-05-15). "Event stream-based process discovery using abstract representations". Knowledge and Information Systems. 54 (2): 407–435. arXiv: 1704.08101 . doi:10.1007/s10115-017-1060-2. ISSN   0219-1377. S2CID   13301390.
  2. Stertz, Florian; Mangler, Juergen; Rinderle-Ma, Stefanie (2020-08-17). "Temporal Conformance Checking at Runtime based on Time-infused Process Models". arXiv: 2008.07262 [cs.SE].
  3. Verbeek, H. M. W.; de Carvalho, R. Medeiros (2018-06-21). "Log Skeletons: A Classification Approach to Process Discovery". arXiv: 1806.08247 [cs.AI].