Date format in Convert CSV to XES
No matter what form I use in the dialog for Dateformat() in completion time in the Convert CSV to XES plugin I get a parse error for all date fields, like this :
Line: 0:
Skipping trace TJ-1000011-1, could not convert[TJ-1000011-1, 2843, 50, c, 03/01/2010 12:00, 2010, Motor, Bodily Injury, Closed, ANITA, 0, 0]
Error: java.text.ParseException: Could not parse 03/01/2010 12:00
I've tried dd:MM:yyyy HH:mm, "dd:MM:yyyy HH:mm", dd/MM/yyyy HH:mm ....
File is in UTF-8 format, tried with identical results on Kubuntu 15.1, Windows 7 64. Any help greatly appreciated
Answers
-
Hi B_G,
I suspect that the pattern
dd/MM/yyyy HH:mm
should work. If not, maybe swap the dd and MM parts.
Please let me know if this works.
Joos Buijs
Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
Previously Assistant Professor in Process Mining at Eindhoven University of Technology -
Hello Joos - Thank you for the very prompt answer, but unfortunately this didn't help. After reading some earlier questions on csv to xes I also tried specifying the start time, giving no format, and even changing the encoding from UTF-8, but still no luck
-
Dear,
Could you post more details on the error? What message does Disco show? How are the dates interpreted???
Maybe you can also search for an online 'Java datetime format' tester or interpreter to see if that helps...
Joos Buijs
Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
Previously Assistant Professor in Process Mining at Eindhoven University of Technology -
Hello again Joos and thank you again for your help- The Convert CVS to XES plugin ends with a popup window with
"warning : Some issues have been detected during conversion " as a Header, and in the window the warnings for individual lines like
Line: 0:
Skipping trace TJ-1000011-1, could not convert[TJ-1000011-1, 2843, 50, c, 2010/01/03, 2010, Motor, Bodily Injury, Closed, ANITA, 0, 0]
Error: java.text.ParseException: Could not parse 2010/01/03
Line: 1:
Skipping trace TJ-1000011-1, could not convert[TJ-1000011-1, 2844, 51, r, 2010/01/04, 2010, Motor, Bodily Injury, Closed, ANITA, 0, 0]
Error: java.text.ParseException: Could not parse 2010/01/04
The input file starts with :
CASE;EVENT;ID1;EVNTTYPE;TIME;YR;BRCH1;BRCH2;STATUS;USER;SUM;SUMIDX
TJ-1000011-1;2843;50;c;2010/01/03 12:00;2010;Motor;Bodily Injury;Closed;ANITA;0;0
TJ-1000011-1;2844;51;r;2010/01/04 13:00;2010;Motor;Bodily Injury;Closed;ANITA;0;0
This I´ve described as yyyy/MM/dd HH:mm yyyy:MM:dd HH:mm and some other variants, also tried to remove the time stamp
Have also read up on the format for dates in Java and it seems to me I´m using them correctly ...
-
Forgot to mention tha the plugin delivers an empty XES file after running, also tried my data/formats out on this http://www.fileformat.info/tip/java/simpledateformat.htm site, and my data and formatting did work alright there
-
Dear,
Thank you for your reply.
I'll notify the developer of the plug-in to see if he can assist further.
Joos Buijs
Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
Previously Assistant Professor in Process Mining at Eindhoven University of Technology -
Hi,
thanks for taking the time to report this issue.
It seems that the rows in your log have different date formats.
In your example you mention:2010/01/03 12:00
But in the error message the row contains only:2010/01/04
I tried to convert it with the current version (nightly builds) of the CSV importer, and it suggests to use the date format:yyyy/M/d
This works, but you loose the 'time' part of the date.
Apparently, we treat the user-supplied date format as 'non lenient', which means that every date encountered has to exactly match the pattern. I think this is a sensible default, as otherwise 'unexpected data' might sneak in. On the other hand, we should have a configuration option for this (maybe in the next version)
What you could do until then is to use your format 'yyyy/MM/dd HH:mm', and select in the last dialog 'Configure Additional Conversion Settings', the 'Error Handling' to 'Omit Event on Error'. This way only non-parsable events are omitted, but not the whole trace.
I hope this helps -
Hello,
I have similar problem as B_G.
I use ProM 6.5.1 and am trying to convert my CSV file to XES with "Convert CSV to XES" module.
My sample CSV file has 3 columns - UID, TIME and MESSAGE where time is in format "dd.MM.yyyy HH:mm:ss". I have assigned UID as Case column and Event column left blank as well as Start time. In Completion time I left preset values (TIME) and Date Format().
In the second step I left all values as preset.
After clicking Finish, I get parsing error (for all entries):Line: 0:Skipping trace UA0005716, could not convert[UA0005716, 04.09.2015 04:35:05, GET /ais/login.do]Error: java.text.ParseException: Could not parse 04.09.2015 04:35:05
What am I doing wrong?
Thanks in advance,
Anton
-
Dear Anton,
My first recommendation is to see if ProM lite works better. I know some bugs were fixed in the CSV importer.
Otherwise, please double check the time conversion format with your time values, they should match otherwise the timestamps cannot be correctly interpreted.
Joos Buijs
Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
Previously Assistant Professor in Process Mining at Eindhoven University of Technology -
Hi ProMers
We seem to be having a similar problem that we have identified as occurring in our date fields (format: dd-MM-yyyy ; i.e., the dataset loads in the absence of the date field). When we try to load a CSV file generated in R that contains a single date field the file loads - no problem (Attachment R_3_dates). When we try to load an identical file created in SAS (Attachment SAS_3_dates) we get the following error:
Sorting CSV file (0.00 MB) by case and time using maximal 546 MB of memory ...
Finished sorting in 0 seconds
Reading cases ...
java.lang.NullPointerException
Notepad and Notes++ identify both of these files as being UTF-8 encoded.
When we add the date column from the SAS file to the R file it still loads using one or both dates (start and end; MSExcel with Notepad to replace '/' with '-'; N.B., we can't get dates with '/' to load at all).
When we add the date column from the R file to the SAS file (MSExcel with Notepad to replace '/' with '-'), the file loads using one or both dates (N.B, making this change in MSExcel and Notepad on the naïve SAS generated file does not help it to load). It doesn’t matter in which order the data is added (column inserted or appended).
When we output the file from SAS as UTF-8 ( data Mydataset (encoding=UTF8); ... ) we get the same problem.
We can’t make sense of this. Can anyone help?
Regards
Peter
-
Which version of of ProM are you using? The latest nightly build should include some improvements in the error reporting, which might help investigate this problem.
Regarding the NullPointerException, can you start ProM from the Batch file (ProM.bat) and see if there is more information on the error? For example a stack trace would be help.
Did you attach files to the post? I cannot see them. Otherwise feel free to send me the file by email f.mannhardt@tue.nl. I will treat them confidential.
-
Dear Peter, thanks for sending me the file. Unfortunately, I cannot reproduce the problem in any of the ProM versions. It just works fine with both files (sas and non-sas) in ProM 6.5.1, ProM Lite and the current nightly build.
Changing the charset to UTF-8 or ISO encoding does not matter as there are no special characters in this log.
I'm using the attached configuration. Could you try to start ProM with the Batch file "ProM651.bat" and send me the information printed in the terminal when the error occurs.
Howdy, Stranger!
Categories
- 1.6K All Categories
- 45 Announcements / News
- 225 Process Mining
- 6 - BPI Challenge 2020
- 9 - BPI Challenge 2019
- 24 - BPI Challenge 2018
- 27 - BPI Challenge 2017
- 8 - BPI Challenge 2016
- 68 Research
- 1K ProM 6
- 394 - Usage
- 288 - Development
- 9 RapidProM
- 1 - Usage
- 7 - Development
- 54 ProM5
- 19 - Usage
- 187 Event Logs
- 32 - ProMimport
- 75 - XESame