To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

Date format in Convert CSV to XES

Hello all

No matter what form I use in the dialog for Dateformat() in completion time in the Convert CSV to XES plugin I get a parse error for all date fields, like this :

Line: 0:
Skipping trace TJ-1000011-1, could not convert[TJ-1000011-1, 2843, 50, c, 03/01/2010 12:00, 2010, Motor, Bodily Injury, Closed, ANITA, 0, 0]
Error: java.text.ParseException: Could not parse 03/01/2010 12:00

I've tried dd:MM:yyyy HH:mm, "dd:MM:yyyy HH:mm", dd/MM/yyyy HH:mm ....

File is in UTF-8 format, tried with identical results on Kubuntu 15.1, Windows 7 64. Any help greatly appreciated

Answers

  • Hi B_G,

    I suspect that the pattern
    dd/MM/yyyy HH:mm
    should work. If not, maybe swap the dd and MM parts.

    Please let me know if this works.

    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Hello Joos - Thank you for the very prompt answer, but unfortunately this didn't help. After reading some earlier questions on csv to xes I also tried specifying the start time, giving no format, and even changing the encoding from UTF-8, but still no luck
  • Dear,

    Could you post more details on the error? What message does Disco show? How are the dates interpreted???

    Maybe you can also search for an online 'Java datetime format' tester or interpreter to see if that helps...
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Hello again Joos and thank you again for your help- The Convert CVS to XES plugin ends with a popup window with

    "warning : Some issues have been detected during conversion " as a Header, and in the window the warnings for individual lines like

    Line: 0:
    Skipping trace TJ-1000011-1, could not convert[TJ-1000011-1, 2843, 50, c, 2010/01/03, 2010, Motor, Bodily Injury, Closed, ANITA, 0, 0]
    Error: java.text.ParseException: Could not parse 2010/01/03

    Line: 1:
    Skipping trace TJ-1000011-1, could not convert[TJ-1000011-1, 2844, 51, r, 2010/01/04, 2010, Motor, Bodily Injury, Closed, ANITA, 0, 0]
    Error: java.text.ParseException: Could not parse 2010/01/04

    The input file starts with :

    CASE;EVENT;ID1;EVNTTYPE;TIME;YR;BRCH1;BRCH2;STATUS;USER;SUM;SUMIDX
    TJ-1000011-1;2843;50;c;2010/01/03 12:00;2010;Motor;Bodily Injury;Closed;ANITA;0;0
    TJ-1000011-1;2844;51;r;2010/01/04 13:00;2010;Motor;Bodily Injury;Closed;ANITA;0;0

    This I´ve described as yyyy/MM/dd HH:mm yyyy:MM:dd HH:mm and some other variants, also tried to remove the time stamp

    Have also read up on the format for dates in Java and it seems to me I´m using them correctly ...


  • Forgot to mention tha the plugin delivers an empty XES file after running, also tried my data/formats out on this http://www.fileformat.info/tip/java/simpledateformat.htm site, and my data and formatting did work alright there

  • Dear,

    Thank you for your reply.
    I'll notify the developer of the plug-in to see if he can assist further.
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Hi,
    thanks for taking the time to report this issue.
    It seems that the rows in your log have different date formats.

    In your example you mention:
    2010/01/03 12:00
    But in the error message the row contains only:
    2010/01/04

    I tried to convert it with the current version (nightly builds) of the CSV importer, and it suggests to use the date format:
    yyyy/M/d
    This works, but you loose the 'time' part of the date.

    Apparently, we treat the user-supplied date format as 'non lenient', which means that every date encountered has to exactly match the pattern. I think this is a sensible default, as otherwise 'unexpected data' might sneak in. On the other hand, we should have a configuration option for this (maybe in the next version)

    What you could do until then is to use your format 'yyyy/MM/dd HH:mm', and select in the last dialog 'Configure Additional Conversion Settings', the 'Error Handling' to 'Omit Event on Error'. This way only non-parsable events are omitted, but not the whole trace.

    I hope this helps :smile:
  • Hello,

    I have similar problem as B_G.
    I use ProM 6.5.1 and am trying to convert my CSV file to XES with  "Convert CSV to XES" module.

    My sample CSV file has 3 columns - UID, TIME and MESSAGE where time is in format "dd.MM.yyyy HH:mm:ss". I have assigned UID as Case column and Event column left blank as well as Start time. In Completion time I left preset values (TIME) and Date Format().
    In the second step I left all values as preset.

    After clicking Finish, I get parsing error (for all entries):
    Line: 0:
    Skipping trace UA0005716, could not convert[UA0005716, 04.09.2015 04:35:05, GET /ais/login.do]
    Error: java.text.ParseException: Could not parse 04.09.2015 04:35:05

    What am I doing wrong?
    Thanks in advance,

    Anton

  • Dear Anton,

    My first recommendation is to see if ProM lite works better. I know some bugs were fixed in the CSV importer.

    Otherwise, please double check the time conversion format with your time values, they should match otherwise the timestamps cannot be correctly interpreted.
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • Hi ProMers

    We seem to be having a similar problem that we have identified as occurring in our date fields (format: dd-MM-yyyy ; i.e., the dataset loads in the absence of the date field).  When we try to load a CSV file generated in R that contains a single date field the file loads - no problem (Attachment R_3_dates).  When we try to load an identical file created in SAS (Attachment SAS_3_dates) we get the following error:

    Sorting CSV file (0.00 MB) by case and time using maximal 546 MB of memory ...

    Finished sorting in 0 seconds

    Reading cases ...

    java.lang.NullPointerException

    Notepad and Notes++ identify both of these files as being UTF-8 encoded. 

    When we add the date column from the SAS file to the R file it still loads using one or both dates (start and end; MSExcel with Notepad to replace '/' with '-';  N.B., we can't get dates with '/' to load at all).

    When we add the date column from the R file to the SAS file (MSExcel with Notepad to replace '/' with '-'), the file loads using one or both dates (N.B, making this change in MSExcel and Notepad on the naïve SAS generated file does not help it to load).  It doesn’t matter in which order the data is added (column inserted or appended).

    When we output the file from SAS as UTF-8 ( data Mydataset (encoding=UTF8); ... ) we get the same problem. 

    We can’t make sense of this.  Can anyone help?

    Regards

    Peter

  • Which version of of ProM are you using? The latest nightly build should include some improvements in the error reporting, which might help investigate this problem.

    Regarding the NullPointerException, can you start ProM from the Batch file (ProM.bat) and see if there is more information on the error?  For example a stack trace would be help.

    Did you attach files to the post? I cannot see them. Otherwise feel free to send me the file by email f.mannhardt@tue.nl. I will treat them confidential.

  • Dear Peter, thanks for sending me the file. Unfortunately, I cannot reproduce the problem in any of the ProM versions. It just works fine with both files (sas and non-sas) in ProM 6.5.1, ProM Lite and the current nightly build.

    Changing the charset to UTF-8 or ISO encoding does not matter as there are no special characters in this log.

    I'm using the attached configuration. Could you try to start ProM with the Batch file "ProM651.bat" and send me the information printed in the terminal when the error occurs.

Sign In or Register to comment.