Process mining is a family of techniques used to analyze event data in order to understand and improve operational processes. Part of the fields of data science and process management, process mining is generally built on logs that contain case id, a unique identifier for a particular process instance; an activity, a description of the event that is occurring; a timestamp; and sometimes other information such as resources, costs, and so on. [1] [2]
There are three main classes of process mining techniques: process discovery, conformance checking, and process enhancement. In the past, terms like workflow mining and automated business process discovery (ABPD) [3] were used.
Process mining techniques are often used when no formal description of the process can be obtained by other approaches, or when the quality of existing documentation is questionable. [4] For example, application of process mining methodology to the audit trails of a workflow management system, the transaction logs of an enterprise resource planning system, or the electronic patient records in a hospital can result in models describing processes of organizations. [5] Event log analysis can also be used to compare event logs with prior model(s) to understand whether the observations conform to a prescriptive or descriptive model. It is required that the event logs data be linked to a case ID, activities, and timestamps. [6] [7]
Contemporary management trends such as BAM (business activity monitoring), BOM (business operations management), and BPI (business process intelligence) illustrate the interest in supporting diagnosis functionality in the context of business process management technology (e.g., workflow management systems and other process-aware information systems). Process mining is different from mainstream machine learning, data mining, and artificial intelligence techniques. For example, process discovery techniques in the field of process mining try to discover end-to-end process models that are able to describe sequential, choice relation, concurrent and loop behavior. Conformance checking techniques are closer to optimization than to traditional learning approaches. However, process mining can be used to generate machine learning, data mining, and artificial intelligence problems. After discovering a process model and aligning the event log, it is possible to create basic supervised and unsupervised learning problems. For example, to predict the remaining processing time of a running case or to identify the root causes of compliance problems.
The IEEE Task Force on Process Mining was established in October 2009 as part of the IEEE Computational Intelligence Society. [8] This is a vendor-neutral organization aims to promote the research, development, education and understanding of process mining, make end-users, developers, consultants, and researchers aware of the state-of-the-art in process mining, promote the use of process mining techniques and tools and stimulate new applications, play a role in standardization efforts for logging event data (e.g., XES), organize tutorials, special sessions, workshops, competitions, panels, and develop material (papers, books, online courses, movies, etc.) to inform and guide people new to the field. The IEEE Task Force on Process Mining established the International Process Mining Conference (ICPM) series, [9] lead the development of the IEEE XES standard for storing and exchanging event data [10] [11] , and wrote the Process Mining Manifesto [12] which was translated into 16 languages.
The term "process mining" was first coined in a research proposal written by the Dutch computer scientist Wil van der Aalst. Thus began a new field of research that emerged under the umbrella of techniques related to data science and process science at the Eindhoven University in 1999. In the early days, process mining techniques were often convoluted with the techniques used for workflow management. In the year 2000, the very first practically applicable algorithm for process discovery, "Alpha miner" was developed. The very next year, in 2001, a much similar algorithm based on heuristics called "Heuristic miner" was introduced in the research papers. Further along the link more powerful algorithms such as inductive miner were developed for process discovery. As the field of process mining began to evolve, conformance checking became an integral part of it. The year 2004 earmarked the development of "Token-based replay" for conformance checking purposes. Apart from the mainstream techniques of process discovery and conformance checking, process mining branched out into multiple areas leading to the discovery and development of "performance analysis", "decision mining" and "organizational mining" in the year 2005 and 2006 respectively. In the year 2007, the first-ever commercial process mining company "Futura Pi" was established. The "IEEE task force on PM", a governing body was formed in the year 2009 that began to overlook the norms and standards related to process mining. Further techniques were developed for conformance checking which led to the publishing of "Alignment-based conformance checking" in the year 2010. In 2011, the first-ever process mining book was published. Further along in 2014, a MOOC course was offered by Coursera on process mining. By the year 2018, nearly 30+ commercially available process mining tools were in the picture. The year 2019 earmarked the first process mining conference. Today we have over 35 vendors offering tools and techniques for process discovery and conformance checking.
Process mining should be viewed as a bridge between data science and process science. Process mining focuses on transforming event log into a meaningful representation of the process which can lead to the formation of several data science and machine learning related problems.
There are three categories of process mining techniques.
Process mining software helps organizations analyze and visualize their business processes based on data extracted from various sources, such as transaction logs or event data. This software can identify patterns, bottlenecks, and inefficiencies within a process, enabling organizations to improve their operational efficiency, reduce costs, and enhance their customer experience.
In March 2023 The Analytics Insight Magazine identified top 5 process mining software companies for 2023: [20]
Gartner also provided a list of products of best process mining tools for 2024 and released the updated 2024 Gartner® Magic Quadrant™ for Process Mining Platforms: [21] [22]
Workflow is a generic term for orchestrated and repeatable patterns of activity, enabled by the systematic organization of resources into processes that transform materials, provide services, or process information. It can be depicted as a sequence of operations, the work of a person or group, the work of an organization of staff, or one or more simple or complex mechanisms.
The term process model is used in various contexts. For example, in business process modeling the enterprise process model is often referred to as the business process model.
An event-driven process chain (EPC) is a type of flow chart for business process modeling. EPC can be used to configure enterprise resource planning execution, and for business process improvement. It can be used to control an autonomous workflow instance in work sharing.
A workflow pattern is a specialized form of design pattern as defined in the area of software engineering or business process engineering. Workflow patterns refer specifically to recurrent problems and proven solutions related to the development of workflow applications in particular, and more broadly, process-oriented applications.
The XML Process Definition Language (XPDL) is a format standardized by the Workflow Management Coalition (WfMC) to interchange business process definitions between different workflow products, i.e. between different modeling tools and management suites. XPDL defines an XML schema for specifying the declarative part of workflow / business process.
Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed that the values are discrete, and thus time series mining is closely related, but usually considered a different activity. Sequential pattern mining is a special case of structured data mining.
Business process discovery (BPD) related to business process management and process mining is a set of techniques that manually or automatically construct a representation of an organisations' current business processes and their major process variations. These techniques use data recorded in the existing organisational methods of work, documentations, and technology systems that run business processes within an organisation. The type of data required for process discovery is called an event log. Any record of data that contains the case id, activity name, and timestamp. Such a record qualifies for an event log and can be used to discover the underlying process model. The event log can contain additional information related to the process, such as the resources executing the activity, the type or nature of the events, or any other relevant details. Process discovery aims to obtain a process model that describes the event log as closely as possible. The process model acts as a graphical representation of the process. The event logs used for discovery could contain noise, irregular information, and inconsistent/incorrect timestamps. Process discovery is challenging due to such noisy event logs and because the event log contains only a part of the actual process hidden behind the system. The discovery algorithms should solely depend on a small percentage of data provided by the event logs to develop the closest possible model to the actual behaviour.
Business process management (BPM) is the discipline in which people use various methods to discover, model, analyze, measure, improve, optimize, and automate business processes. Any combination of methods used to manage a company's business processes is BPM. Processes can be structured and repeatable or unstructured and variable. Though not required, enabling technologies are often used with BPM.
Willibrordus Martinus Pancratius van der Aalst is a Dutch computer scientist and full professor at RWTH Aachen University, leading the Process and Data Science (PADS) group. His research and teaching interests include information systems, workflow management, Petri nets, process mining, specification languages, and simulation. He is also known for his work on workflow patterns.
Discovery Net is one of the earliest examples of a scientific workflow system allowing users to coordinate the execution of remote services based on Web service and Grid Services standards. The system was designed and implemented at Imperial College London as part of the Discovery Net pilot project funded by the UK e-Science Programme. Many of the concepts pioneered by Discovery Net have been later incorporated into a variety of other scientific workflow systems.
The α-algorithm or α-miner is an algorithm used in process mining, aimed at reconstructing causality from a set of sequences of events. It was first put forward by van der Aalst, Weijters and Măruşter. The goal of Alpha miner is to convert the event log into a workflow-net based on the relations between various activities in the event log. An event log is a multi-set of traces, and a trace is a sequence of activity names. Several extensions or modifications of it have since been presented, which will be listed below.
Artifact-centric business process model represents an operational model of business processes in which the changes and evolution of business data, or business entities, are considered as the main driver of the processes. The artifact-centric approach, a kind of data-centric business process modeling, focuses on describing how business data is changed/updated, by a particular action or task, throughout the process.
Business process conformance checking is a family of process mining techniques to compare a process model with an event log of the same process. It is used to check if the actual execution of a business process, as recorded in the event log, conforms to the model and vice versa.
Mathias Weske is a German computer scientist, and Professor of Business Process Technology at the University of Potsdam, known for his contributions in the field of business process management and as a founder of the business Signavio.
Computer-assisted interventions (CAI) is a field of research and practice, where medical interventions are supported by computer-based tools and methodologies. Examples include:
The IEEE STANDARD 1849-2016, IEEE Standard for eXtensible Event Stream (XES) for Achieving Interoperability in Event Logs and Event Streams, is a technical standard developed by the IEEE Standards Association. It standardizes "a language to transport, store, and exchange event data ". In 2023, the standard has been revised in and superseded by the IEEE Standard 1849-2023.
The IEEE Task Force on Process Mining (TFPM) is a non-commercial association for process mining. The IEEE Task Force on Process Mining was established in October 2009 as part of the IEEE Computational Intelligence Society at the Eindhoven University of Technology.
Inductive miner belongs to a class of algorithms used in process discovery. Various algorithms proposed previously give process models of slightly different type from the same input. The quality of the output model depends on the soundness of the model. A number of techniques such as alpha miner, genetic miner, work on the basis of converting an event log into a workflow model, however, they do not produce models that are sound all the time. Inductive miner relies on building a directly follows graph from event log and using this graph to detect various process relations.
Process mining is a technique used to turn event data into insights and actions. Techniques used in process mining such as Process discovery and Conformance checking depend only on the order of activities executed in the operations. The event log not only contains the activity details, but also timestamps, resources and data accompanied with process execution. Careful analysis of the external details from the event log can reveal useful information that can be used for making predictions on decisions that might be taken in the future, efficiency and working dynamics of the team, and performance analysis.
Streaming conformance checking is a type of doing conformance checking where the deviation is reported directly when it happens. Instead of event log, streaming conformance checking techniques take event stream and process model as input and for each received event from the stream, it will be compared with the model.