Indexing IM logs with Elasticsearch

Remember my old project for processing instant messaging logs? Probably, because I wrote about it five years ago. Well, the project is only mostly dead, every once in a while I still oc­ca­sion­al­ly work on it.

I mostly use it as an excuse to learn tech­nolo­gies that are used outside of the Google bubble. One thing that really impressed me with how well it works and how easy it is to set up was Elas­tic­search. Elas­tic­search is a search engine. You give it your documents and it indexes them and enables you to query them fast. There are other projects that do this for you, but ES can continue.

State of IM on Windows

Cum mi-am reinstalat Windowsul recent, am decis să îmi văd dacă există al­ter­na­tive mai bune pentru unele programe pe care le folosesc. În primul rând mi-am au­tom­a­ti­zat procesul de instalare a pro­gramelor. Și, după câteva zile în care nu am avut niciun client de IM pe calculator, am decis să văd cum au mai evoluat lucrurile în ultimul an.

tl;dr: Nicicum. Tot Trillian o să îl folosesc.

Am câteva cerințe pe care un program de IM trebuie să le în­deplin­ească ca să îl folosesc. În primul rând trebuie să fie multi-protocol (cel puțin Y!M, GTalk și Facebook). Așa că din start pică toate chestiile gen Yahoo Messenger, Windows Live Messenger, etc. În continue.

Processing IM logs

For a few years now, I've always kept my IM archives. I didn't really have a purpose, I just thought that it might be fun to one day look back and see what kind of dis­cus­sions I had. Well, now I have 150 Mb of logs from Digsby, Trillian and Pidgin and there is no way I'm ever going to read that again. But in light of a few things I learned recently (the Coursera NLP and ML courses) I am going to try to visualize and analyze my archives in a math­e­mat­i­cal way. That's right, I'm reducing you to numbers. :D. At least what we've discussed continue.