Kenneth Marshall wrote:


How likely is it that you will get a hash collision, two strings that are
different that will hash to the same value? To avoid this requires a very
large hash key (128 bits, minimum)- otherwise you get into birthday attack
problems. With a 32-bit hash, the likelyhood is greater than 50% that two
strings in a collection of 100,000 will hash to the same value. With a
64-bit hash, the likelyhood is greater than 50% that two strings in a
collection of 10 billion will has to same value. 10 billion is a large
number, but not an unreasonable number, of strings to want to put into a
hash table- and it's exactly this case where the O(1) cost of hashtables
starts being a real win.

Brian

Yes, there is a non-negligible chance of collision (In a DB is there
any chance that is non-negligible? :) ) and the values must be checked
against the actual. The win is the collapse of the index size and only
needed to check a small fraction of the actual tuples.


Ah, OK- I misunderstood you. I thought you were saying that the hash
values would need to be unique, and you wouldn't check the original
values at all. My bad.

Brian

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

People

Translate

site design / logo © 2021 Grokbase