Knol got
equation editing a few days ago, and to celebrate I've written up a knol on one of my favorite CS tricks,
simhashing using sets. It's a technique I picked up at Google, and it's extremely useful for grouping together similar things in a large dataset.
Suppose you have a huge number of items that you would like to group together by a fuzzy notion of similarity. Suppose the only tool available to you is a key-value store. Suppose you only have the resources to consider each object once. Never fear, simhashing is here!
3 comments:
Wow. Neat! Thank you -- I had never seen this.
Hi Ryan,
It seems I'm a few date too late to read your article on knol.
Do you have a copy of the article somewhere else?
Regards,
Markus
Thanks for the note! I've updated the link to the new location.
Post a Comment