To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

How to evaluate process models extracted from trace clustering

I want to know how to do a evaluation on results(discovered process models)  taking from the  trace clustering plugin. I want to know whether is it doing using conformance checking plugin. I have gone through this question when I was reading the paper headed "Discovering Deviating Cases and Process Variants Using Trace Clustering". There you have evaluated the results using two metrics called  cluster entropy and split rate. And also I have seen on some research papers evaluation has done by using metrics like fitness,structural appropriateness,behavioral appropriateness. How the values for those metrics can be achieved. Can you please answer this question and be kind if I understood something wrong in PM?

Thanks and regards,
Gayan Buddhika

P.S I am doing my undergraduate research on discovering meaningful process models from flexible business process. 


  • Dear Gayan,

    There's no real answer to your question other than "it depends".
    Typically, you are clustering cases in an event log for a certain reason, for example in order to get more structured process models, or to find deviating cases etc. How to evaluate a clustering result strongly depends on the goal you set out to do.

    Evaluations without conformance checking typically looks at some clustering properties such as entropy, split rate, cohesion/coupling, etc. Whereas from conformance checking you could get fitness, etc.

    If your aim is not improving model quality, but for example just finding related cases for usage in performance predictions for example, the former techniques seem more suitable. If you want to know more about the latter, model-based, metrics I would suggest reading up on conformance checking.

    In your email you wrote that you're working on identifying process variants in flexible environments. This sounds like your goal here is finding more structured process models for individual clusters, which represent your variants. Several trace clustering techniques have been proposed in literature that have this aim as well. If, on the other hand, a process variant to you represents a non-structural similarity between cases, another evaluation method would be more suitable.

    Kind regards,

    Bart Hompes
    Bart Hompes - Eindhoven University of Technology
  • Dear bhompes,

    Thanks for your response.
    I have another question. My goal is to find usage patterns(scenarios) from a dataset which is taken from flexible environment.  I can find  many distinct process variants in my dataset (I used disco).Basically It shows hundreds of process variants.I tried to cluster this using trace clustering techniques implemented in prom 5.2. They give me results  but i had to predefine no. of clusters there. And also I tried MCL alogarithm in prom 6.6 . In the literature I found couple of papers using mcl(your one and "Mining usage scenarios in business processes: Outlier-aware discovery and run-time prediction" by Francesco Folino et al.). They have used it basically for outlier,deviation detection. As I understood for successfully applying mcl clustering the dataset must posses enough case attribute data.Why other trace clustering techniques are not implemented in prom 6.6.are those techniques considered obsolete? can I use mcl clustering for my requirement too? can you kindly suggest me what are the appropriate clustering techniques for my task.according to your previous answer I think metrics in conformance checking is suitable to evaluate the discovered process models(usage patterns).

    Thanks and Regards,
    Gayan Buddhika.
Sign In or Register to comment.