To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.
Process mining: discovering and improving spaghetti and lasagna processes
Let this topic serve as a general discussion of mentioned paper in the title. The paper is available here: http://wwwis.win.tue.nl/~wvdaalst/old/publications/p615.pdf
Abstract
Process mining is an emerging discipline providing comprehensive sets of tools to provide fact-based insights and to support process improvements. This new discipline builds on process model-driven approaches and data mining. This invited keynote paper demonstrates that process mining can be used to discover a wide range of processes ranging from structured processes (Lasagna processes) to unstructured processes (Spaghetti processes). For Lasagna processes, the discovered process is just the starting point for a broad repertoire of analysis techniques that support process improvement. For example, process mining can be used to detect and diagnose bottlenecks and deviations in (semi-)structured processes. The analysis of Spaghetti processes is more challenging. However, the potential benefits are substantial; just by inspecting the discovered model, important insights can be obtained. Process discovery can be used to understand variability and non-conformance. This paper presents the L∗ life-cycle model consisting of five phases. The model describes how to apply process mining techniques.
My question is the following:
Under stage 2 of the L*-framework the following is stated:
"After completing stage 2 (create control-flow model and connect event log, .ed) there is a control-flow model tightly connected to the event log, i.e. events in the event log refer to activities in the model. This connection is crucial for subsequent steps. If the fitness of the model is low (say below 0.8)".
It does however not give an exact reason of why this is the stage. Is it because the integration of the other perspectives (organizational, time and case) becomes less valuable when a smaller part of the event log is able to be replayed on the process model?
I look forward to your input.
Abstract
Process mining is an emerging discipline providing comprehensive sets of tools to provide fact-based insights and to support process improvements. This new discipline builds on process model-driven approaches and data mining. This invited keynote paper demonstrates that process mining can be used to discover a wide range of processes ranging from structured processes (Lasagna processes) to unstructured processes (Spaghetti processes). For Lasagna processes, the discovered process is just the starting point for a broad repertoire of analysis techniques that support process improvement. For example, process mining can be used to detect and diagnose bottlenecks and deviations in (semi-)structured processes. The analysis of Spaghetti processes is more challenging. However, the potential benefits are substantial; just by inspecting the discovered model, important insights can be obtained. Process discovery can be used to understand variability and non-conformance. This paper presents the L∗ life-cycle model consisting of five phases. The model describes how to apply process mining techniques.
My question is the following:
Under stage 2 of the L*-framework the following is stated:
"After completing stage 2 (create control-flow model and connect event log, .ed) there is a control-flow model tightly connected to the event log, i.e. events in the event log refer to activities in the model. This connection is crucial for subsequent steps. If the fitness of the model is low (say below 0.8)".
It does however not give an exact reason of why this is the stage. Is it because the integration of the other perspectives (organizational, time and case) becomes less valuable when a smaller part of the event log is able to be replayed on the process model?
I look forward to your input.
Comments
-
Dear Erik,
Great idea to discuss this paper in this topic, and you pose a valid question.
A fitness value below 0.8 indicates that there is not a good alignment, i.e. a lot of mismatching, between the observed data in the event log and the modelled behavior in the process model.
We use 0.8 as a rule of thumb as the lowest feasible replay fitness value that allows further analysis, such as performance analysis. Otherwise there is too much 'guessing' in the alignments to repair the mismatch that the quality of the subsequent analysis goes down drastically.
At the same time achieving a replay fitness of 1.0 is rarely possible without sacrificing precision at the same time.
I hope this answers your question!
Joos Buijs
Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
Previously Assistant Professor in Process Mining at Eindhoven University of Technology
Howdy, Stranger!
Categories
- 1.6K All Categories
- 45 Announcements / News
- 225 Process Mining
- 6 - BPI Challenge 2020
- 9 - BPI Challenge 2019
- 24 - BPI Challenge 2018
- 27 - BPI Challenge 2017
- 8 - BPI Challenge 2016
- 68 Research
- 1K ProM 6
- 394 - Usage
- 288 - Development
- 9 RapidProM
- 1 - Usage
- 7 - Development
- 54 ProM5
- 19 - Usage
- 187 Event Logs
- 32 - ProMimport
- 75 - XESame