To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.
Event tag order in traces
in Event Logs
Hey all!
I'm trying to manually extract data from an ERP-like data source and build xes event logs from it. This works fairly smooth so far, but I hit a snag when I noticed that the process models I mine from these logs do not look like what I'd expect them to look like.
For an example, look at the two attached toy logs; The 'ordered' log contains three traces in which all events appear in the same order for every trace as well as in the order of their timestamps. For the 'jumbled' version, I simply switched the order of events within two of the traces without modifying their timestamps.
Now, I would have expected to obtain identical process models, since the process flow should be dictated by the actual time stamps. Instead it appears that the ordering of the <event>s inside the <trace>s has semantic value and is used by the mining algorithms in constructing the model (I tried the inductive miner & heuristic miner; see the attached images for a result).
Is this a quirk? Or intended behavior / industry convention? If so, what is the rationale behind it (as in: is there any benefit in considering in-log event tag order in addition to time-stamp order)?
Also, any comments on the structure of my logs are much appreciated (the ones attached here are highly simplified, but they capture most of what I'm trying to do).
Cheers!
I'm trying to manually extract data from an ERP-like data source and build xes event logs from it. This works fairly smooth so far, but I hit a snag when I noticed that the process models I mine from these logs do not look like what I'd expect them to look like.
For an example, look at the two attached toy logs; The 'ordered' log contains three traces in which all events appear in the same order for every trace as well as in the order of their timestamps. For the 'jumbled' version, I simply switched the order of events within two of the traces without modifying their timestamps.
Now, I would have expected to obtain identical process models, since the process flow should be dictated by the actual time stamps. Instead it appears that the ordering of the <event>s inside the <trace>s has semantic value and is used by the mining algorithms in constructing the model (I tried the inductive miner & heuristic miner; see the attached images for a result).
Is this a quirk? Or intended behavior / industry convention? If so, what is the rationale behind it (as in: is there any benefit in considering in-log event tag order in addition to time-stamp order)?
Also, any comments on the structure of my logs are much appreciated (the ones attached here are highly simplified, but they capture most of what I'm trying to do).
Cheers!
Comments
-
Dear 'djr',
This is intended behavior as events are expected to be ordered, and algorithms use the events in the provided order. This is mainly done to handle the case when timestamps are on a higher granular level (e.g. day), and hence 2 events might not be able to be sorted.
I hope this answers you question.
Joos Buijs
Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
Previously Assistant Professor in Process Mining at Eindhoven University of Technology
Howdy, Stranger!
Categories
- 1.6K All Categories
- 45 Announcements / News
- 225 Process Mining
- 6 - BPI Challenge 2020
- 9 - BPI Challenge 2019
- 24 - BPI Challenge 2018
- 27 - BPI Challenge 2017
- 8 - BPI Challenge 2016
- 68 Research
- 1K ProM 6
- 394 - Usage
- 288 - Development
- 9 RapidProM
- 1 - Usage
- 7 - Development
- 54 ProM5
- 19 - Usage
- 187 Event Logs
- 32 - ProMimport
- 75 - XESame