To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

Classification of events

edited January 2012 in - XESame
I'm new to the Process Mining world, and trying to muddle my way through it.  I could use some advice however, on how to classify some events.

I have an event log where the same events are happening over multiple years.  I am wanting to try to do a social analysis to see whether there are a group of people who take activities together.  For example:

User  Activity   Date
1       A            2000
2       A            2000
3       B            2000
1       C            2001
2       C            2001
3       A            2001

Should show that user 1 and 2 are performing A and C together.  I'm not sure, however, what the proper way would be to label the information.

I would appreciate any advice or links to resources to help.


Best Answers

  • Isabel
    edited January 2012 Accepted Answer
    Dear RobJanzen,

    there are a few possibilities:

    ProM 5.2: Social Network Miner -> Similar Task

    ProM 6.1: Mine for a Similar Task Social Network

    I quote a short definition of "similar task", you should probaly use "hamming distance" as metric.

    Similar task metric: It does not consider how individuals work together on shared cases but focuses on the activities they do. The assumption here is that people doing similar things have stronger relations than people doing completely different things. Each individual has a “profile” based on how frequent they conduct specific activities. There are four kinds of distance metrics. Euclidean distance is the “ordinary” distance between two points that one would measure with a ruler. (It only gives good results if performers execute comparable volumes of work.) Pearson’s correlation coefficient is frequently used to find the relationship among cases. Similarity coefficient is a statistic used for comparing the similarity and diversity of sample sets. Hamming distance does not consider the absolute frequency but only whether it is 0 or not.

    Another idea is to have a look at the Originator by Task Matrix (ProM 5.2, Analysis). Each individual column shows you the originators performing a specific task.

    Kind regards,
  • Isabel
    Accepted Answer
    Sorry Rob, I didn't notice until now that your question was posted to the XESame forum. I don't really know much about XESame but I can give you hints on the keys you should use in the XES format.

    Your user column should be mapped to the key org:resource (log->trace->event).
    The activity column goes to the key concept:name (log->trace->event).

    However regarding your "events" and the date column, I'm not sure whether the date should be mapped to the key time:timestamp (log->trace->event) or if the year here can be seen as a case id (like in "the year 2000 case involves activity A performed by user 1") - key concept:name in (log->trace). Should you map the year as case id, you could use the "working together" metric rather than "similar task" metric.

    Hope I have made myself clear :) Maybe somebody else has better hints for you...

    Kind regards,
  • Isabel
    edited January 2012 Accepted Answer
    Joos, I also had the idea of concatenating event name with year in mind :) Is it possible to achieve that with XESame? Can you also modify (re-map) given XES/MXML logs with XESame instead of using data sources like mysql, csv, whatever?
  • JBuijs
    Accepted Answer
    Hi Isabel,

    XESame can easily do this by setting the value of the concept:name of the event to something like:
    activity & ' - ' & year    (which will result in the event name 'A - 2000')
    if you use something like MS Access, Excel or CSV files this will work. For other backend DB systems (Oracle, PostgreSQL, ...) you should change the '&' into something that is used by that system ('||' etc.).

    XESame can read in XML (and therefore also XES/MXML) if you use an appropriate JDBC or ODBC driver.
    A quick search came up with this driver that can do this but there are many more (also free) available.

    Hope this answers your question Isabel :)
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology


  • JBuijs
    edited January 2012
    Hi Rob Janzen and Isabel,

    @Isabel: thank you for replying to this thread. I have only minor additions to your suggestions.

    @RobJanzen: First of all, do you also have more detailed timestamps and also case IDs in your data set? I'm not sure what your complete data set looks like and if the three columns you mentioned are only used as an example for your problem.
    I'm assuming that you have more detailed timestamps and case id's for your events.
    Then, what you could do is set the event name as the combination activity and year which would result in 'A - 2000' and 'A - 2001' in your example. This allows you to easily distinguish your events.

    Officially however you should/could create an event classifier that defines an event classification combining the activity and year attributes (just as the default classifier combines the event name and event lifecycle attributes). The different mining plug-ins should ideally recognize these classifiers and provide you the option to choose which one to use. This is never done however so you should use the suggestion with the event name concatenation with the year.

    Please let us know if this answers your question Rob!
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Hi Joos and Isabel,

    Yes, that was sample data to try to simplify the problem.  I am looking at registration information for courses and am interested in the social network students generate.

    Thank-you for the suggestions!  I will see how they play out.

Sign In or Register to comment.