To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

Discovering Deviating Cases and Process Variants Using Trace Clustering

The excellent article "Discovering Deviating Cases and Process Variants Using Trace Clustering" by B.F.A. Hompes, J.C.A.M. Buijs, W.M.P. van der Aalst, P.M. Dixit and J. Buurman describes a case clustering algorithm based on "perspectives". A perspective is a map from cases to tuples of real numbers like for instance the assignment to each patient flow of the hospital length of stay. According to the article the clusters are built using the cosine similarity of the cases. The PROM TraceClustering plugin that delivers an implementation of the algorithms described in the article does not allow to specify directly the similarity of two cases. The plugin only allows to select the "dimensions" to be used for the algorithm. In my log I have selected two attributes that contain nominal variables: "department" and "diagnosis" (the log file describes the activity of an emergency room"). If my understanding is correct the clusters identified by the algorithm contain patient cases which have the same department and same diagnosis.  The links between the clusters connect patients which may have for instance different diagnosis but the same department.  Related to this a plugin is a second plugin that requires a MCL clustering parameters input file. How can I create such a file? Does this allow me to modify the similarity function?

Comments

  • Dear Mauro,

    First of all, I am glad you liked our paper! Sorry I did not find your question earlier.

    At the moment, the implementation works as follows: each case in the event log is mapped to a so-called profile vector according to the chosen perspectives. For now, the dimensions of this vector can be binary (value of perspective present or not) or integer (count of occurrence of a value for a certain perspective).

    If you take for example the occurrence of department as a clustering perspective, and there are five departments in total in your event log, each case will be mapped to a vector of length five (every department is one dimension), the value for each dimension will be 0 or 1 depending on whether that value is present in the case.

    From these profile vectors, a case similarity matrix is built by computing case-by-case similarity using the cosine similarity function. The matrix is input for the Markov clustering algorithm as described in the paper.

    The version of the plug-in you see that takes a parameters object as additional input is actually usable from the non-GUI version of ProM, because indeed this object you cannot create using the GUI. Also, this object only holds the chosen perspectives and the expansion and inflation parameters for the MCL algorithm. It has nothing to do with the similarity function.

    Using the ProM GUI it is not possible to use a different similarity function, or to specify that e.g. department X is more similar to dept. Y than to dept. Z. For this, you would need to implement your own custom similarity function. Luck would have it ( :wink: ), that the implementation (which you can find on https://svn.win.tue.nl/repos/prom/Packages/TraceClustering/), is implemented in a generic way, so you can actually implement a custom similarity function if you want to, by running ProM from code.

    I hope this answers your question.

    Bart
    Bart Hompes - Eindhoven University of Technology
Sign In or Register to comment.