Inductive miner

Last updated

Inductive miner belongs to a class of algorithms used in process discovery. [1] Various algorithms proposed previously give process models of slightly different type from the same input. The quality of the output model depends on the soundness of the model. A number of techniques such as alpha miner, genetic miner, work on the basis of converting an event log into a workflow model, however, they do not produce models that are sound all the time. Inductive miner relies on building a directly follows graph from event log and using this graph to detect various process relations. [2]

Contents

Definitions

A directly follows graph is a directed graph that connects an activity A to another activity B if and only if activity B occurs chronologically right after activity A for any given case in the respective event log. A directly follows graph is represented mathematically by:

[2]

Where

(directly follows relation) If some activity a is ever the first activity, then . Similarly, if some activity a is ever the last activity, then

the set of all "starting nodes"

the set of all "ending nodes"

The inductive miner technique relies on the detection of various cuts on the directly follows graph created using the event log. The core idea behind inductive miner lies in the unique methodology of discovering various divisions of the arcs in the directly follows graph, and using the smaller components after division to represent the execution sequence of the activities. The inductive miner algorithm uses the directly follows graph to detect one of the following cuts. [2]

is an exclusive OR cut iff:

XOR cut - inductive miner Xor-for-inductive-miner.png
XOR cut - inductive miner

is a sequence cut iff:

Sequence cut - inductive miner Sequence-for-inductive-miner.png
Sequence cut - inductive miner

is a parallel cut iff:

-

-

Parallel cut - inductive miner Parallel-for-inductive-miner.png
Parallel cut - inductive miner

is a redo loop cut iff:

-

-

-

-

-

Loop cut - inductive miner Loop-for-inductive-miner.png
Loop cut - inductive miner

Types

  1. Inductive miner with fall through: The complex event log sometimes would make it impossible to detect any cuts using the above techniques. In that case, there are additional fall throughs that can be applied to obtain better representation of process tree instead of a flower model. [3] [4]
  2. Inductive miner frequency-based: The less frequent relations in the event log sometimes creates problems in detecting any type of cuts. In that case, the directly follows relations below a certain threshold are removed from the directly follows graph and the resultant graph is used for detecting the cuts. [5]
  3. Inductive miner for big data: This includes an improvement on the existing inductive miner to handle big data sets.[ citation needed ]

Related Research Articles

<span class="mw-page-title-main">Travelling salesman problem</span> NP-hard problem in combinatorial optimization

In the theory of computational complexity, the travelling salesman problem (TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" It is an NP-hard problem in combinatorial optimization, important in theoretical computer science and operations research.

In axiomatic set theory and the branches of mathematics and philosophy that use it, the axiom of infinity is one of the axioms of Zermelo–Fraenkel set theory. It guarantees the existence of at least one infinite set, namely a set containing the natural numbers. It was first published by Ernst Zermelo as part of his set theory in 1908.

<span class="mw-page-title-main">Maximal and minimal elements</span> Element that is not ≤ (or ≥) any other element

In mathematics, especially in order theory, a maximal element of a subset of some preordered set is an element of that is not smaller than any other element in . A minimal element of a subset of some preordered set is defined dually as an element of that is not greater than any other element in .

<span class="mw-page-title-main">Random graph</span> Graph generated by a random process

In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs lies at the intersection between graph theory and probability theory. From a mathematical perspective, random graphs are used to answer questions about the properties of typical graphs. Its practical applications are found in all areas in which complex networks need to be modeled – many random graph models are thus known, mirroring the diverse types of complex networks encountered in different areas. In a mathematical context, random graph refers almost exclusively to the Erdős–Rényi random graph model. In other contexts, any graph model may be referred to as a random graph.

In the foundations of mathematics, von Neumann–Bernays–Gödel set theory (NBG) is an axiomatic set theory that is a conservative extension of Zermelo–Fraenkel–choice set theory (ZFC). NBG introduces the notion of class, which is a collection of sets defined by a formula whose quantifiers range only over sets. NBG can define classes that are larger than sets, such as the class of all sets and the class of all ordinals. Morse–Kelley set theory (MK) allows classes to be defined by formulas whose quantifiers range over classes. NBG is finitely axiomatizable, while ZFC and MK are not.

Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief propagation is commonly used in artificial intelligence and information theory, and has demonstrated empirical success in numerous applications, including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.

Finite model theory is a subarea of model theory. Model theory is the branch of logic which deals with the relation between a formal language (syntax) and its interpretations (semantics). Finite model theory is a restriction of model theory to interpretations on finite structures, which have a finite universe.

<span class="mw-page-title-main">Maximal independent set</span> Independent set which is not a subset of any other independent set

In graph theory, a maximal independent set (MIS) or maximal stable set is an independent set that is not a subset of any other independent set. In other words, there is no vertex outside the independent set that may join it because it is maximal with respect to the independent set property.

In computational complexity theory, Polynomial Local Search (PLS) is a complexity class that models the difficulty of finding a locally optimal solution to an optimization problem. The main characteristics of problems that lie in PLS are that the cost of a solution can be calculated in polynomial time and the neighborhood of a solution can be searched in polynomial time. Therefore it is possible to verify whether or not a solution is a local optimum in polynomial time. Furthermore, depending on the problem and the algorithm that is used for solving the problem, it might be faster to find a local optimum instead of a global optimum.

In recursion theory, α recursion theory is a generalisation of recursion theory to subsets of admissible ordinals . An admissible set is closed under functions, where denotes a rank of Godel's constructible hierarchy. is an admissible ordinal if is a model of Kripke–Platek set theory. In what follows is considered to be fixed.

An interpretation is an assignment of meaning to the symbols of a formal language. Many formal languages used in mathematics, logic, and theoretical computer science are defined in solely syntactic terms, and as such do not have any meaning until they are given some interpretation. The general study of interpretations of formal languages is called formal semantics.

Different definitions have been given for the dimension of a complex network or graph. For example, metric dimension is defined in terms of the resolving set for a graph. Dimension has also been defined based on the box covering method applied to graphs. Here we describe the definition based on the complex network zeta function. This generalises the definition based on the scaling property of the volume with distance. The best definition depends on the application.

<span class="mw-page-title-main">Betweenness centrality</span> Measure of a graphs centrality, based on shortest paths

In graph theory, betweenness centrality is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices, that is, there exists at least one path such that either the number of edges that the path passes through or the sum of the weights of the edges is minimized. The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex.

<span class="mw-page-title-main">Dependency network</span>

The dependency network approach provides a system level analysis of the activity and topology of directed networks. The approach extracts causal topological relations between the network's nodes, and provides an important step towards inference of causal activity relations between the network nodes. This methodology has originally been introduced for the study of financial data, it has been extended and applied to other systems, such as the immune system, semantic networks, and functional brain networks.

Inductive probability attempts to give the probability of future events based on past events. It is the basis for inductive reasoning, and gives the mathematical basis for learning and the perception of patterns. It is a source of knowledge about the world.

<span class="mw-page-title-main">Hyperbolic geometric graph</span>

A hyperbolic geometric graph (HGG) or hyperbolic geometric network (HGN) is a special type of spatial network where (1) latent coordinates of nodes are sprinkled according to a probability density function into a hyperbolic space of constant negative curvature and (2) an edge between two nodes is present if they are close according to a function of the metric (typically either a Heaviside step function resulting in deterministic connections between vertices closer than a certain threshold distance, or a decaying function of hyperbolic distance yielding the connection probability). A HGG generalizes a random geometric graph (RGG) whose embedding space is Euclidean.

Graph cut optimization is a combinatorial optimization method applicable to a family of functions of discrete variables, named after the concept of cut in the theory of flow networks. Thanks to the max-flow min-cut theorem, determining the minimum cut over a graph representing a flow network is equivalent to computing the maximum flow over the network. Given a pseudo-Boolean function , if it is possible to construct a flow network with positive weights such that

Cederbaum's theorem defines hypothetical analog electrical networks which will automatically produce a solution to the minimum s–t cut problem. Alternatively, simulation of such a network will also produce a solution to the minimum s–t cut problem. This article gives basic definitions, a statement of the theorem and a proof of the theorem. The presentation in this article closely follows the presentation of the theorem in the original publication.

In set theory and logic, Buchholz's ID hierarchy is a hierarchy of subsystems of first-order arithmetic. The systems/theories are referred to as "the formal theories of ν-times iterated inductive definitions". IDν extends PA by ν iterated least fixed points of monotone operators.

References

  1. Wil van der Aalst (March 2010). "Process Discovery Capturing the Invisible". IEEE Computational Intelligence Magazine. 5: 28–41. doi:10.1109/MCI.2009.935307. S2CID   14520822.
  2. 1 2 3 Leemans, Sander J. J.; Fahland, Dirk; van der Aalst, Wil M. P. (2013). Colom, José-Manuel; Desel, Jörg (eds.). "Discovering Block-Structured Process Models from Event Logs - A Constructive Approach". Application and Theory of Petri Nets and Concurrency. Lecture Notes in Computer Science. 7927. Berlin, Heidelberg: Springer: 311–329. doi:10.1007/978-3-642-38697-8_17. ISBN   978-3-642-38697-8.
  3. Ghawi, Raji (2016). "Process Discovery using Inductive Miner and Decomposition". arXiv: 1610.07989 [cs.AI].
  4. Leemans, S. J. J. (2017-05-09). Robust process mining with guarantees - SIKS Dissertation Series No. 2017-12 (PDF). TU/e - Eindhoven University of Technology. ISBN   978-90-386-4257-4.
  5. Leemans, Sander J. J.; Fahland, Dirk; van der Aalst, Wil M. P. (2014). Lohmann, Niels; Song, Minseok; Wohed, Petia (eds.). "Discovering Block-Structured Process Models from Event Logs Containing Infrequent Behaviour". Business Process Management Workshops. Lecture Notes in Business Information Processing. 171. Cham: Springer International Publishing: 66–78. doi:10.1007/978-3-319-06257-0_6. ISBN   978-3-319-06257-0.