Sequence Identification in the Inductive Miner
For my master's thesis, I am evaluating process mining for modelling sequential data. As high fitness is a requirement in my case so I am using the base Inductive Miner. When using this miner during my experiments, I got results that I cannot explain, hence this question here on the form.
Below is the render of a part of my model (unrelated branches have been removed for simplicity). Green nodes indicate parallel gateways, and yellow nodes exclusive choice gateways.
Following this model, the tasks `resHJ|wireless`, `acctManip|wireless` and `ACE|wireless` are completely in parallel with the sequence `exfil|wireless, dManip|wireless, remoteexp|wireless, rPrivEsc|wireless`. However, in the dataset I am using, the seven tasks only appear in the same sequence:
- exfil|wireless dManip|wireless resHJ|wireless ACE|wireless remoteexp|wireless acctManip|wireless rPrivEsc|wireless
Of course, the model still gives perfect fitness as the sequence is still possible in the model shown, but there is no evidence in the underlying data to show a parallel relation between the tasks.
The way I see this, the directly-follows graph cannot contain any indication of parallelism between the resHJ, acctManip and ACE tasks and the four tasks which are in the sequence. Besides, following the paper on the Inductive Miner (https://www.win.tue.nl/~dfahland/publications/LeemansFA_2013_blockstructured.pdf), a sequential cut is always considered before a parallel cut, hence this can also not be explained by the miner favouring parallelism over sequentiality.
Hence my question: is this a known issue with the Inductive Miner, or is there something else going on which I am missing?
As a reference: I am generating these models using the code base used in "Automated Discovery of Process Models from Event Logs: Review and Benchmark" (Paper: https://arxiv.org/pdf/1705.02288.pdf, codebase: https://github.com/raffaeleconforti/ResearchCode).
Using ProM 6.10, I get the same models. Furthermore, the same pattern occurs when using the Inductive Miner-Infrequent with the default 20% setting, but to a lesser extend.
Hi Geert,A possible explanation for this can be that the problematic labels resHJ|wireless, acctManip|wireless, and ACE|wireless also appear elsewhere in the process, like in the grey dot or in unrelated branches that you have now removed for simplicity.As an example, if the grey dot would somehow include the sequence acctManip|wireless ACE|wireless resHJ|wireless, then this result of the InductiveMiner would make sense, as any two of these three activities can then appear in any order.Kind regards,Eric.
Thank you for your answer, but this is not the case.
Looking at my dataset, the seven tasks are shown in the diagram only occur in that sub-sequence, and never occur in any other trace.
To verify this issue, I also ran some experiments where I inserted artificial sequential dependencies for some tasks. For this, I replaced all occurrences of certain tasks in the dataset with a sequence of artificial tasks.
The page below links to two example models I got from running some experiments (for some reason I cannot upload images to this forum).
For the first model, I replace all occurrences of `ACE|wireless` with a sequence of artificial tasks `ACE|wireless ACE|wireless_0 ACE|wireless_1 ACE|wireless_2`. I would expect that this sequence is picked up by the Inductive Miner as the only dependencies these artificial tasks have is a sequential dependency between each other.
As an alternative experiment, I changed the data to create an exclusive-choice dependency between `ACE|wireless` and the artificial tasks. I did this by looking for traces in which the tasks occurs, and adding new traces where the task is replaced by the artificial task. The sub-model from this experiment is also shown on the page.
Here I see that the Inductive Miner correctly found the relation between the artificial tasks and moved it to the correct place in the full sequence. However, the unaffected task `exfil|wireless` was moved outside the sequence and into the surrounding parallel structure.
Hopefully, this makes my finding more clear. If you want to experiment for yourself, I can also work on an example dataset where this issue occurs.
Hi Geert,Can you share one of these logs with me (email@example.com)? That may help in finding out what happens here.Kind regards,Eric.
- 1.5K All Categories
- 45 Announcements / News
- 215 Process Mining
- 6 - BPI Challenge 2020
- 9 - BPI Challenge 2019
- 24 - BPI Challenge 2018
- 27 - BPI Challenge 2017
- 8 - BPI Challenge 2016
- 67 Research
- 974 ProM 6
- 379 - Usage
- 286 - Development
- 8 RapidProM
- 1 - Usage
- 6 - Development
- 54 ProM5
- 19 - Usage
- 185 Event Logs
- 30 - ProMimport
- 75 - XESame