To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

Mining process log with mixture of deterministic and non-deterministic events

Hi, I am dealing with a process log which contains some events handled by machine and some by human intervene. All machine events are deterministic (not changing) and human part (whenever human interacts) is very random. How to mine such a process?

I am looking for a plugin/technique to mine the process on the basis of events, not on basis of cases. Kindly help if anyone has any clue of sorting this out. I can explain more if needed. 


  • Hi,

    Can you provide an example of such a log, and of the model that you would like to discover from that log?

    Kind regards,
  • Hi Eric, thanks for your reply.

    I am sharing an abstracted log as the attached file. It is a typical online chat process between chatbot server and customer. The process kicks off with a question raised by the machine which is answered by the customer. During this process flow, machine guesses the solution to the query of the customer and suggest a workaround to the customer at the end of the process. 'Answer given by' column in the data provides information about the reply from machine or customer.

    All the machine-related activities are triggered by the answer from the customer so only branching possible in the process is when a customer interacts. Machine part is fixed, and all activities done by machine are deterministic.   

    Query1: Am I right to say that machine part and human part both are activities of this process? 

    Query2: Is there any way in process mining that I can examine both portions of the process separately, with fact that all cases are random and there is no strict format in the actual log. I want to see what causes this process to be very lengthy and how to make the process efficient. What changes I can suggest in the machine portion and what in customer one.

    Apologies for the long reply. Please provide your detailed suggestions. 

  • Hi,

    It seems you do have cases, like 1234 and 5678.

    I would say that the questions asked by the machine correspond to the activities. Based on the answers, a certain route (series of questions) is followed through the process. Using control-flow discovery, you could then discover a process model containing all possible flows. Later on, you could enhance this model with guards, which may be discovered based on the answers as given (also by humans). Possibly, you will need to categorize the answers first. Being plain text (or spoken words), different answers may in fact convey the same information. Also, you can enhance the model with timing information and check where possible bottlenecks are.

    Kind regards,
  • Hi Eric,
    Thanks for your reply. I have done the control-flow discovery part and also run through ActiTraC and AHC using ProM6 (nightly builds too).  Now in advance stage, I am trying to do the conformance analysis to identify bottlenecks and most frequent paths, etc.

    Assume that the answers are categorized already, what you exactly mean by 'Later on, you could enhance this model with guards, which may be discovered based on the answers as given (also by humans).'?
    Can you suggest any plugin/documentation/algorithm which can allow me to check conformance of both Human and machine part separately? (or maybe it is not a good idea?)
  • Hi,

    There are some plug-ins that can add data to a (discovered) Petri net based on the information that is contained in a log. For example, given a Petri net and a log, you can run the "Discovery of the Process-Data Flow (Decision-Tree Miner)" plug-in to get a Petri net with guards. Whether this plug-in is available may depend on your version of ProM. I'm using the latest 6.10 release for this.

    I';m not sure what you mean with conformance of the human part. Humans in the process just provide answers to the questions, right? And these answers could be, well, anything. What would then conformance be?

    Kind regards,
  • Hi Eric,

    Thanks for your reply. I took some time to get back as I was going through literature and trying to sort out confusions related to conformance.  Below are two queries based on our discussion:

    1) "Decision-Tree Miner' plugin is not working for me in both ProM6.10 and Lite version. After configuration screen when I click Continue then it takes me back to the  Actions screen. Is there anything I am doing wrong?

    2) Can you suggest any plugin/literature which performs event-level clustering/filtering. Like in the Explorer view of the log, can I filter my log from Event1 till Event 15. I need this as I want to slice my large process log into several segments. Such as25% of process from start, last 40% of events etc.

    Thanks once again for your time. I am really not able to handle the log that I am dealing with. Your expert advice will definitely help. 
  • Hi,

    Regarding 2) I would advise to use something like R or Python or Excel or SQL for pre-processing. I don't really understand what you mean with 'Filter from Event1 till Event 15'. Do you mean to split traces on the first occurrence of 'Event 1' until the first occurrence of 'Event 15'?

    Regarding 1), I just tried the plugin and it works fine. Can you share a reproducible example? So an XES log and a Petri net so I can check this.
  • Hi Mannhardt,

    Thanks for your reply. I have seen your work at ProM and  have read your research papers. Its a blessing to have people like you and Eric around for replying to our queries.

    Regarding 1) I have tested the plugin on public data (BPI & Artificial Loan Process) and its working fine, so there may be some issue with my data set. I am reviewing my dataset and will try again using the plugin.

    Regarding 2) I am using excel for pre-processing now.  I wanted to split my large numbered event log (average 280 events per case)  into manageable segments but I was not sure that on which basis I can split the cases. One way is to split the cases is on percentage of length, such as 25% length for each split. But I wanted to see if there is any other literature available where split is done using some other statistical basis. I am searching literature on this. 

    Thanks for your assistance once again. 
Sign In or Register to comment.