On 22 June 2013 14:48, Stephen Frost wrote:

Based on your argument that we want to have a bucket load which is, on
average, the size of NTUP_PER_BUCKET, I have to '-1' on this.
That is not my argument. I am pointing out that the comments claim the
code does that, and yet the code does not.

So either we apply the patch to make the code fit the comment, or we
change the comment.

Tom's wish for an average tuples per bucket of 10 may be connected to
that error; I can't say. But we definitely don't end up with an
average of 10 in many cases, which is the basis for previous poor
performance reports.
What we
want is to size a table large enough that we never have any true
collisions (cases where different 32-bit hash values end up in the same
bucket) *for all tuples being hashed*, that includes both the side
building the hash table and the side doing the scan. This should be
done when we can avoid batching- if we don't have enough to avoid
batching while having such a large table, we should consider adjusting
the hash table size down some because, in those cases, we're memory

When we have enough memory to avoid batching, we never want to have to
check down through a bucket for a tuple which doesn't exist- we should
simply be able to look in the hash table itself, determine (with a
single fetch) that there are no entries for that hash value, and throw
the tuple away and move on to the next.
The bottom line is that you're hoping for perfection. If we knew
exactly how many tuples were in the hash table then we could size it
precisely for the hash join we are about to perform. We don't know
that, so we can't size it precisely. Worse than that is that we know
we badly estimate the number of buckets on a regular basis.

That gives us three choices: if we have a hash table fixed without
prior knowledge then we either (0) we continue to underallocate them
in many cases or (1) potentially overallocate buckets. Or (2) we learn
to cope better with the variation in the number of entries. If we rule
out the first two we have only the last one remaining.

(1) will always be a heuristic, and as you point out could itself be
an extreme setting.

So I think that (2) is the best route: Given that we know with much
better certainty the number of rows in the scanned-relation, we should
be able to examine our hash table after it has been built and decide
whether it would be cheaper to rebuild the hash table with the right
number of buckets, or continue processing with what we have now. Which
is roughly what Heikki proposed already, in January.

  Simon Riggs http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 24 | next ›
Discussion Overview
grouppgsql-hackers @
postedJun 22, '13 at 1:15p
activeJul 3, '13 at 8:09a



site design / logo © 2021 Grokbase