To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

Q: How does Inductive Miner directly follows detect embedded optional sequences?

Again, "Inductive Miner - directly follows" in ProM 6.10 has me stumped. The implementation obviously has a way to detect optional sequences within sequences, which is nice. However, while reading Sander Leemans' Ph. D. thesis, I cannot find the specification for this behaviour. The specification in the dissertation seems to lead to a flat structure in those examples. This is surprising, as a relevant example showing the nested structure is contained in Sander's thesis itself. So the example and the behaviour of ProM 6.10 match, but I can't for the life of me find the section of the thesis where this is specified.

Can anyone point me to that section, please?

Here's the example:

Consider this log:
(a, b, c),
(a, c),
(a, e),
(b, c, d, e),
(c, d),
(c, e),

This log will lead to the DFG that is shown in Figure 5.20 of the thesis as the graph of process tree M61.
The graph has edges (a, b), (a, c), (a, e), (b, c), (c, d), (c, e), (d, e) where a, b, c, e are start nodes and a, c, d, e are end nodes.

As I understand the definition of sequence cut and the function sequenceDfgSplit, we should get a flat sequence in which every activity is optional (that is, has a tau activity as an alternative). This flat structure does not capture the facts that if b does not end the process, then c must occur afterwards, and if d occurs, then c must occur before it. ProM discovers a process tree that captures these facts by discovering top-level clusters (a), (b,c,d), (e), each of which according to sequenceDfgSplit is optional, and then recursing on (b,c,d) to find a sequence cut in which c is not optional.

I do not understand what part of the specification allows the miner to derive that result, in particular, how it manages to group (b,c,d) into one cluster on the top-level.

-- Sebastian


  • Sebastian
    edited May 2021
    Update: I have looked at the source code of CutFinderIMSequence, and there's the following passage:

    * Optimisation 4-8-2015: do not greedily use the maximal cut, but
    * choose the one that minimises the introduction of taus.

    So obviously the implementation has been enhanced and differs from the thesis. I'd still like to know if this is a spur-of-the-moment implementation decision, or if this is theoretically described in a paper somewhere.

    In particular, there might be several alternative non-maximal cuts to consider. Are there any results on how much this slows up the algorithm?
Sign In or Register to comment.