To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

java.lang.OutOfMemoryError: GC overhead limit exceeded

Hi,

I am trying to perform process mining on a dataset of approximately 8 million events and a footprint size of 500mb in CSV format. When I try to run the CSV to XES converter plugin on both my laptop  and unix machine both with 8GB memory on each I get an  out of memory error. I have set the java memory option to  -Xmx8Gb and can see on my laptop that the java process gets to approximately 6.5gb of ram used and it seems to have processed all or nearly all the events before it hangs and then crashes with this error.

Any ideas how I can perform Process Mining on a dataset of this size?

Comments

  • Hi CFoley,

    In what step exactly does ProM give the out of memory error? Already when loading the CSV into ProM?
    And have you tried it in ProM lite? This should work better than ProM 6.6.
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Hi,

    Thanks for your response. The step it fails at is when I try to run the CSV to XES converter.

    Today I have tried using various XELite versions for the Xfactory Implementation, these seem to complete but then I am presenterd with a screen with a error message (below) rather than the log visualizer. I can see the XES file in my workspace but when I try to view it again using log visualizer I get presented with the error again

    Unable to produce the requested visualization

    Error Message

    java.lang.ArrayIndexOutOfBoundsException: -631296

    Debug Information for Reporting

    Visualizer: @2 Log Summary

    Stack trace: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: -631296

    at java.util.concurrent.FutureTask.report(Unknown Source)

    at java.util.concurrent.FutureTask.get(Unknown Source)

    at javax.swing.SwingWorker.get(Unknown Source)

    at org.processmining.framework.plugin.ProMFuture.get(ProMFuture.java:119)

    at org.processmining.framework.plugin.impl.PluginExecutionResultImpl.synchronize(PluginExecutionResultImpl.java:106)

    at org.processmining.contexts.uitopia.model.ProMView$1.run(ProMView.java:305)

    at java.lang.Thread.run(Unknown Source)




  • Erm, interesting :wink:

    Could you try this on a small part on the event log? And maybe share the resulting event log? That would allow us to debug.

    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • I tried it with a 50% sample of the cases and it works fine. So it seems it might be due to the size of the full event log.
    I have tried the full log with PromLite using a couple of importers (such as optiized for memory and trove + sequential IDs) but none are working. Is there one particular XELite  plugin that you would recommend that could handle large files like this?
  • Hi CFoley,

    I think that this report by the author of XESLite, Felix Mannhardt, might provide more details
    http://bpmcenter.org/wp-content/uploads/reports/2016/BPM-16-04.pdf

    Also this post by Felix gives some insight:
    http://www.win.tue.nl/promforum/discussion/comment/1487/#Comment_1487

    Hope this helps!

    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Hi,

    Many thanks for your reply, since I commented last I have some more details. The XES importer finishes and I can view the xes file in a unix terminal and it seems fine.
    When I try to run a log summary in ProM it fails with an Array Index out of bounds error mentioned above and when using the same file (with same number of traces) it always reports the same exact error when trying to open the log summary: java.lang.ArrayIndexOutOffBoundsExceptionL -631800.
    What is strange is that the Trace Variants view opens fine.

    The log has approximately 1.5 million cases (approx 7 million rows). I tried reducing the number of rows and I found that it works at 1million but then at 1.08 million it produces the same exception again.

Sign In or Register to comment.