Q: Inductive Miner directly follows - weaker requirements loop footprints?
If an activity from a redo part has a connection to/from the body part, then it has connections to/from all start/end activities
I wonder if this requirement may not be relaxed to the following:
If an activity from a redo part has a connection to/from the body part, then
either it has connections to/from all start/end activities
or all those connections are to start or from end activities.
The difference is that we would still be able to find a loop cut if a redo part is not connected to all start/end activities, but in case it isn't, is not connected to anything but start/end activities. This would cover some cases of incompleteness.
Consider Sander's example in figure 6.8 (a) on p. 214 of his thesis. (This example is taken out of context, as it is intended to illustrate IMc, but a similar point can be made here about IMd.) Sander presents log99:
(a, b, c, d, e, f, a, c, b, d)
(a, c, d, b, e, f, a, b, c, d)
(c, a, b, d)
(c, a, d, b)
(c, d, a, b)
Inductive Miner directly follows returns a flower model after first applying the fall-through strictDfgTauLoop. With my revision, we would be able to find a loop cut and derive the following process tree:
LOOP(PARA(SEQ(a,b), SEQ(c,d)), SEQ(e,f))
which would capture the facts that a always occurs before b, c before d and the redo part must contain e and f.
Here is another example. Consider the log
(a, b, c, a, b, c, f)
(a, b, d, e, a, b, d, f)
Intuitively, we might expect to get a tree from this log that has a loop which either ends in c and loops back with a tau, or ends in d and loops back with an e. However, original IMd would reason as follows: When c, d are identified as end nodes (by the split after the first sequence cut) we cannot put e into the redo part, because e is not connected from end node c. So no loop cut exists and strictDfgTauLoop applies, which removes the edge from c to a. We can then discover another sequence and are left with an almost-loop, in which a similar problem applies: e is only connected from d, but not connected from the new end node b. In this case, there isn't even an edge from b or d to a, so strictDfgTauLoop does not apply, and we get a flower model over {a,b,d,e}.
Things change when we add the trace (a, b, c, e, a, b, c, f) to the log. This trace contains an edge from c to e. IMd then derives the following structure, which is closer to our expectations:
SEQ(LOOP(SEQ(LOOP(SEQ(a,b,XOR(c,tau)),tau),XOR(d,tau)),e),f)
With my revision, IMd would derive this identical structure from the original (incomplete) log.
Can anyone see anything wrong with my proposed improvement to IMd? (Sander, please?)
Comments
-
Here's another example, this time not involving incompleteness, but a loop where the redo part is connected to the body:(a, b, c, d, e, a, b, c, d)(a, b, c, d, e, b, c, d)(c, d)With the original requirement, no cut or fall-through would apply in IMd and we would find a flower model over {a,b,c,d,e}. (In fact, ProM embeds that flower model in an XOR with a tau-branch, thus including the empty trace, the reason for which escapes me.)With the relaxation, we can find a loop cut. We can place {e,b} into the redo part, because from that set node e is connected from all end nodes (d ), while the successors of b are start nodes (c ). So we can derive the following tree:LOOP(XOR(a,SEQ(c,d)),SEQ(XOR(e,tau),XOR(b,tau)))
Howdy, Stranger!
Categories
- 1.6K All Categories
- 45 Announcements / News
- 225 Process Mining
- 6 - BPI Challenge 2020
- 9 - BPI Challenge 2019
- 24 - BPI Challenge 2018
- 27 - BPI Challenge 2017
- 8 - BPI Challenge 2016
- 68 Research
- 1K ProM 6
- 394 - Usage
- 288 - Development
- 9 RapidProM
- 1 - Usage
- 7 - Development
- 54 ProM5
- 19 - Usage
- 187 Event Logs
- 32 - ProMimport
- 75 - XESame