Simple_simhash

A pure ANSI-C implementation of calculating a SimHash over 4-byte tuples (including multiplicities) for a given byte stream. Simple and reasonably fast, no dynamic memory allocations (outside of some stack usage). Uses a counting bloom filter to count multiplicities while keeping memory consumption constant.
Alternatives To Simple_simhash
Select To Compare


Popular Text Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
C
Ansi