Kenneth Marshall wrote:

I understand that a hash value is a many-to-one mapping. That is the

point of the flag in the index. The flag means that there is only one

item in the heap corresponding to that hash value. In this case we

know that the value in the heap is the correct one and a possibly

very expensive string comparison can be skipped. Given that the hash

function is doing its job, almost every string comparison can be skipped.

How long would it take to compare 1-32K of data? How much CPU usage?

With this field in place, you only need to check tuple visibility.

I understand that a hash value is a many-to-one mapping. That is the

point of the flag in the index. The flag means that there is only one

item in the heap corresponding to that hash value. In this case we

know that the value in the heap is the correct one and a possibly

very expensive string comparison can be skipped. Given that the hash

function is doing its job, almost every string comparison can be skipped.

How long would it take to compare 1-32K of data? How much CPU usage?

With this field in place, you only need to check tuple visibility.

are different that will hash to the same value? To avoid this requires

a very large hash key (128 bits, minimum)- otherwise you get into

birthday attack problems. With a 32-bit hash, the likelyhood is greater

than 50% that two strings in a collection of 100,000 will hash to the

same value. With a 64-bit hash, the likelyhood is greater than 50% that

two strings in a collection of 10 billion will has to same value. 10

billion is a large number, but not an unreasonable number, of strings to

want to put into a hash table- and it's exactly this case where the O(1)

cost of hashtables starts being a real win.

Brian