To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.
ProM5  Hamming distance in Social Network Miner (Similar task)
Hi all,
Trying to understand how the social network miner works, I'm experimenting some troubles with the "Similar task" metrics. In particular, it seems that the Hamming distance implementation differs from the definition given in the corresponding paper.
In practice, instead of the distance function defined in the paper:
d(x,y) = 0 if ((x>0 && y>0)  (x == y == 0))
= 1 otherwise
in UtilOperation.java, line 110, the following distance function is implemented:
d'(x,y) = 0 if (x == y)
= 1 otherwise
that is, if A performs activity act 1 time and B performs the same activity 2 times, their distance with respect to act is 0 for d and 1 for d'.
Is my obvservation correct? I am not sure about d', that I extracted from the code. If I am right, why the implemented distance differs from the paper one?
Moreover, it sounds strange to me also the calculation of the overall distance between originators A,B as
(column  temp) / column
(line 114), because a greater temp (which counts how many times activitites differ) leads to a smaller distance, while I would expect
temp/column
which assigns a greater value to those pair having a smaller distance.
In this case is simply a issue of inversion, but again I think that there is a discrepancy between the paper and the implementation. There is no problem, except that a user is not sure of the meaning of a greater or lower value on the edge from A to B (maybe the convention is written somewhere and I didn't find out that place).
Trying to understand how the social network miner works, I'm experimenting some troubles with the "Similar task" metrics. In particular, it seems that the Hamming distance implementation differs from the definition given in the corresponding paper.
In practice, instead of the distance function defined in the paper:
d(x,y) = 0 if ((x>0 && y>0)  (x == y == 0))
= 1 otherwise
in UtilOperation.java, line 110, the following distance function is implemented:
d'(x,y) = 0 if (x == y)
= 1 otherwise
that is, if A performs activity act 1 time and B performs the same activity 2 times, their distance with respect to act is 0 for d and 1 for d'.
Is my obvservation correct? I am not sure about d', that I extracted from the code. If I am right, why the implemented distance differs from the paper one?
Moreover, it sounds strange to me also the calculation of the overall distance between originators A,B as
(column  temp) / column
(line 114), because a greater temp (which counts how many times activitites differ) leads to a smaller distance, while I would expect
temp/column
which assigns a greater value to those pair having a smaller distance.
In this case is simply a issue of inversion, but again I think that there is a discrepancy between the paper and the implementation. There is no problem, except that a user is not sure of the meaning of a greater or lower value on the edge from A to B (maybe the convention is written somewhere and I didn't find out that place).
Comments

I think you misunderstood the code.
In line 110, it checks whether values are equal or not. If two values are different, it increase the variable 'temp'. Then a hamming value is obtained by (column  temp) / column.
Minseok.

The point is that temp (according to the paper, if I'm right) should be increment not when the values are different, but when one value is zero and other is not zero.

Before line 110, it preprocesses a matrix (please see the m.forEachNonZero()) which changes a value into 1 if it is bigger than 0. Thus checking the difference is enough. I mean when one value is zero and other is not zero, the "temp" will be 1.
ps. When I tested the code with test cases, it worked well. Could you send me an example which makes the problem? Then, it will be a big help for me to fix it. 
Thank to your help I realized that I'm wrong, the point is that I was not able to see how the cern package is implemented.
Howdy, Stranger!
Categories
 1.6K All Categories
 45 Announcements / News
 224 Process Mining
 6  BPI Challenge 2020
 9  BPI Challenge 2019
 24  BPI Challenge 2018
 27  BPI Challenge 2017
 8  BPI Challenge 2016
 67 Research
 1K ProM 6
 390  Usage
 287  Development
 9 RapidProM
 1  Usage
 7  Development
 54 ProM5
 19  Usage
 186 Event Logs
 31  ProMimport
 75  XESame