To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

Q: Token-replay, identically labelled transitions, and backtracking

Hello there,

I am confused about how token-replay algorithms (and in particular those available in ProM 5) make a choice between backtracking and creating missing tokens when checking a trace for conformance to an accepting Petri net.

I am supposing here that the Petri net may contain multiple identically labelled transitions, and that it comes from anywhere, not necessarily one of the well-known miners, but perhaps created manually.

When dealing with such nets, I believe we will need an algorithm that can backtrack, because it is always possible to fire the wrong transition for an activity. Indeed, we might simply be on the wrong branch of the search tree, and all the work involved in creating a missing token and continuing with that token might be futile, because in reality the trace should fit the model perfectly, if only we had taken another transition earlier.

For example, consider the simple case when there is a loop in the net, and the first transition after the loop is the same as the first transition in the loop (let's call them "X"): you just cannot know, when the next activity in the trace is "X" and you do not look farther ahead, whether you should re-enter the loop or exit it. So when the next but one activity after the loop comes in (let's say "Y"), but you have re-entered the loop on the previous round, the right thing to do would be to backtrack and exit the loop, not create a token for a spurious missing "X". (You have the converse problem when you have exited the loop on the previous round, but now another "X" comes in instead of "Y".)

What token-replay algorithms are there that can handle such cases, and how do they handle the choice between token creation and backtracking?

I'd appreciate pointers to papers, and explanations about algorithms that are in ProM (ProM 5, I guess, as ProM 6 seems to be mainly about alignments, not token-replay.)

This might be a very basic question, but I haven't found a well-explained answer in the books and papers I have perused.

-- Sebastian





Comments

  • Dear Sebastian,

    For details on the token replay in ProM 5, it would be best to check the PhD thesis of Anne Rozinat, which you can find here: http://alexandria.tue.nl/extra2/690060.pdf. This thesis describes the token replay as implemented in ProM 5.

    Kind regards,
    Eric.

  • Thanks for answering. But, well, yes, I was hoping for an improvement on section 4.4.2 of Rozinat's thesis. She says:

    Finally, the log replay used to calculate the fitness metric has to use a heuristic, local approach to limit its computational complexity in the case that a model contains invisible or duplicate tasks. Unfortunately, this means that it cannot be guaranteed that if the log fits the model it can be replayed correctly (and thus any mismatch really indicates a conformance problem). For example, we choose the shortest sequence of invisible tasks to enable the currently replayed task if possible. However, from a global viewpoint it could always be the case that firing some longer sequence would actually produce exactly those tokens that are needed in a later stage of the replay. Dealing with this issue in a global manner (i.e., minimizing the number of missing and remaining tokens during log replay) seems intractable for complexity reasons.

    She discusses choosing the correct one of multiple tasks in more detail when she presents her "algorithm 2":

    if there are more [than one] candidates enabled, the remaining log events must be considered to determine the best choice. For these purposes, chooseBestCandidate() is called. It makes a copy of the current replay scenario for each enabled duplicate and fires the transition belonging to that candidate (i.e., it starts to mimic every case). Then the entry point for the recursive method tracking these scenarios traceBestCandidate() is reached and will not return until there is only one scenario left, which can then be reported to the initial caller to proceed with the actual log replay.

    So a) recursively trying every case with infinite look-ahead suspiciously sounds like exponential complexity, b) the possibility of invisible tasks indirectly enabling other transitions is disregarded, c) the possibility of determining a fitness < 1 for a trace that can be perfectly fitted to the model is accepted.

    I was hoping that someone had thought of something better in the last decade...

  • There is an interesting paper that presents a new and improved token-based conformance check, which is implemented in pm4py.

    Berti, Alessandro and Wil M. P. van der Aalst. “Reviving Token-based Replay: Increasing Speed While Improving Diagnostics.” ATAED@Petri Nets/ACSD (2019)

    However, Berti & v. d. Aalst also do not explicitly address my question.

  • Hi Sebastian,

    The alignments are considered as "something better" than token replay. Although is also quite complex, it avoids some of the issues of token replay. So, not much work has perhaps been done on token replay, but it has been done on alignments instead.

    Kind regards,
    Eric.
  • Hi Sebastian,

    Seems our messages have crossed each other.

    Then perhaps it would be best to contact Alessandro Berti. Although this paper may not address your question, you could ask him.

    Kind regards,
    Eric.




Sign In or Register to comment.