hashing logs

abinash · June 2011

hey guys,

I want to cluster group of texts or just messages of logs. i want to give hash value to the string on such a ways that the two texts on same cluster have similar values than that of other cluster. Suppose there are many texts like-

' user1 logged in'

'hari logged in vsat1'

.........................

..........................

'hari logged in isat'

'user2 logged in'

well, what i need to do is to give certain value to all the texts such that ' user1 logged in' and 'user2 logged in' would have similar value and 'hari logged in vsat1' and ''hari logged in isat'' have similar values.

There could be thousands of the logs or texts and its almost impossible to compare each with other. Do you have any idea given that the texts i am given may be context free and of any type. I dont want to go towards NLP. what i want is just to devise and algorithm that can give suitable value to each text.

JBuijs · June 2011

Hi Abinash,

I'm not sure but do the replies of JC in your other topic answer your question?
E.g. to use existing linguistic clustering techniques???

abinash · June 2011

hey Jbuijs,

Actually the other topic was about my idea of another approach to solve the problem. Ajay and me are among the group members of this project. Actually, I wanna classify the logs according to the messages only. And i want to classify the streaming logs...only one pass or iterations. This is just my thought and if it can be done then it'll be the best possible and simplest way.

Mapping the strings into some hash value such that the two interrelated strings/logs has similar values. Taking about 1000 logs to find the pattern in this way and then i hope the classification for streaming logs would be easier to accomplish in one pass.

the other approach could involve syntatic analysis or on some contextual basis which include some form of NLP. So, i thought if i could do it in some simpler and more efficient way. the efficiency of classification depends upon efficiency of hash functions. So, i hope you could provide me some of the techniques of mapping strings(general unstructured context free) by effective hashing.

If the idea seems somewhat immature please help me. I am just bachelor level student without much of experiences. But i promise i can learn and try hard.

Thanks you!

hashing logs

Comments

Howdy, Stranger!

Categories

In this Discussion