FAQ
Currently, we restrict btree index tuples to a size that ensures three of
them will fit on a page. The motivation for this is the following two
considerations:

1. In a non-rightmost page, we need to include a "high key", or page
boundary key, that isn't one of the useful data keys.

2. In a non-leaf page, there had better be at least two child pages
(downlink entries), else we have failed to subdivide the page's key
range at all, and thus there would be a nonterminating recursion.

However: a non-leaf page actually has one more pointer than key,
eg a page with three children needs only two data keys:

---------------- entire key range assigned to page ------------------

-- range 1 -- boundary key -- range 2 -- boundary key -- range 3 --
v v v
child page 1 child page 2 child page 3

We implement this by having the first data "tuple" on a non-leaf page
contain only a downlink TID and no key data, ie it's just the header.

So it appears to me that we could allow the maximum size of a btree
entry to be just less than half a page, rather than just less than
a third of a page --- the worst-case requirement for a non-leaf page
is not three real tuples, but one tuple header and two real tuples.
On a leaf page we might manage to fit only one real data item, but
AFAICS that doesn't pose any correctness problems.

Obviously a tree containing many such pages would be awfully inefficient
to search, but I think a more common case is that there are a few wide
entries in an index of mostly short entries, and so pushing the hard
limit up a little would add some flexibility with little performance
cost in real-world cases.

Have I missed something? Is this worth changing?

regards, tom lane

Search Discussions

  • Josh Berkus at Jul 11, 2006 at 2:46 pm
    Tom,
    Obviously a tree containing many such pages would be awfully inefficient
    to search, but I think a more common case is that there are a few wide
    entries in an index of mostly short entries, and so pushing the hard
    limit up a little would add some flexibility with little performance
    cost in real-world cases.

    Have I missed something? Is this worth changing?
    Not sure. I don't know that the difference between 2.7K and 3.9K would
    have ever made a difference to me in any real-world case.

    If we're going to tinker with this code, it would be far more valuable
    to automatically truncate b-tree entries at, say, 1K so that they could
    be efficiently indexed.

    Of course, a quick archives search of -SQL, -Newbie and -General would
    indicate how popular of an issue this is.

    --Josh Berkus
  • Hannu Krosing at Jul 11, 2006 at 9:20 pm

    Ühel kenal päeval, T, 2006-07-11 kell 10:46, kirjutas Josh Berkus:
    Tom,
    Obviously a tree containing many such pages would be awfully inefficient
    to search, but I think a more common case is that there are a few wide
    entries in an index of mostly short entries, and so pushing the hard
    limit up a little would add some flexibility with little performance
    cost in real-world cases.

    Have I missed something? Is this worth changing?
    Not sure. I don't know that the difference between 2.7K and 3.9K would
    have ever made a difference to me in any real-world case.
    One (hopefully) soon-to-be real-world case is index-only queries.

    We discussed one approach with Luke and he expressed interest in getting
    actually done in not too distant future.
    If we're going to tinker with this code, it would be far more valuable
    to automatically truncate b-tree entries at, say, 1K so that they could
    be efficiently indexed.
    That would not work, if we want to get all data from indexes.

    Maybe compressing the keys (like we do for TOAST) would be a better
    solution.
    Of course, a quick archives search of -SQL, -Newbie and -General would
    indicate how popular of an issue this is.
    It may become populat again, when we will be able to do index-only
    scans.

    --
    ----------------
    Hannu Krosing
    Database Architect
    Skype Technologies OÜ
    Akadeemia tee 21 F, Tallinn, 12618, Estonia

    Skype me: callto:hkrosing
    Get Skype for free: http://www.skype.com
  • Jim C. Nasby at Jul 19, 2006 at 10:16 pm

    On Tue, Jul 11, 2006 at 10:02:49AM -0400, Tom Lane wrote:
    Currently, we restrict btree index tuples to a size that ensures three of
    them will fit on a page. The motivation for this is the following two
    considerations:

    1. In a non-rightmost page, we need to include a "high key", or page
    boundary key, that isn't one of the useful data keys.
    Why does a leaf page need a boundary key? ISTM if that wasn't the case,
    we could actually allow keys to be nearly 8K, constrained by a non-leaf
    page needing to include two pointers.

    I guess I must be missing something here (and nbtree/README isn't
    helping).
    2. In a non-leaf page, there had better be at least two child pages
    (downlink entries), else we have failed to subdivide the page's key
    range at all, and thus there would be a nonterminating recursion.

    However: a non-leaf page actually has one more pointer than key,
    eg a page with three children needs only two data keys:

    ---------------- entire key range assigned to page ------------------

    -- range 1 -- boundary key -- range 2 -- boundary key -- range 3 --
    v v v
    child page 1 child page 2 child page 3

    We implement this by having the first data "tuple" on a non-leaf page
    contain only a downlink TID and no key data, ie it's just the header.

    So it appears to me that we could allow the maximum size of a btree
    entry to be just less than half a page, rather than just less than
    a third of a page --- the worst-case requirement for a non-leaf page
    is not three real tuples, but one tuple header and two real tuples.
    On a leaf page we might manage to fit only one real data item, but
    AFAICS that doesn't pose any correctness problems.

    Obviously a tree containing many such pages would be awfully inefficient
    to search, but I think a more common case is that there are a few wide
    entries in an index of mostly short entries, and so pushing the hard
    limit up a little would add some flexibility with little performance
    cost in real-world cases.

    Have I missed something? Is this worth changing?

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 1: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to majordomo@postgresql.org so that your
    message can get through to the mailing list cleanly
    --
    Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
    Pervasive Software http://pervasive.com work: 512-231-6117
    vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
  • Tom Lane at Jul 19, 2006 at 10:23 pm

    "Jim C. Nasby" <jnasby@pervasive.com> writes:
    On Tue, Jul 11, 2006 at 10:02:49AM -0400, Tom Lane wrote:
    1. In a non-rightmost page, we need to include a "high key", or page
    boundary key, that isn't one of the useful data keys.
    Why does a leaf page need a boundary key?
    So you can tell whether a proposed insertion ought to go into this page,
    or the one to its right. The tree descent logic doesn't guarantee that
    you descend to exactly the correct page --- if concurrent page splits
    are going on, you might have to "move right" one or more times after
    reaching the leaf level. You need the boundary key to make this test
    correctly.

    And of course, the reason there's no high key on the rightmost page is
    exactly that it has no right-hand neighbor, hence no upper limit on its
    delegated key space.

    regards, tom lane
  • Jim C. Nasby at Jul 19, 2006 at 11:28 pm

    On Wed, Jul 19, 2006 at 06:23:44PM -0400, Tom Lane wrote:
    "Jim C. Nasby" <jnasby@pervasive.com> writes:
    On Tue, Jul 11, 2006 at 10:02:49AM -0400, Tom Lane wrote:
    1. In a non-rightmost page, we need to include a "high key", or page
    boundary key, that isn't one of the useful data keys.
    Why does a leaf page need a boundary key?
    So you can tell whether a proposed insertion ought to go into this page,
    or the one to its right. The tree descent logic doesn't guarantee that
    you descend to exactly the correct page --- if concurrent page splits
    are going on, you might have to "move right" one or more times after
    reaching the leaf level. You need the boundary key to make this test
    correctly.

    And of course, the reason there's no high key on the rightmost page is
    exactly that it has no right-hand neighbor, hence no upper limit on its
    delegated key space.
    Could you not just scan right and see what the first key was? Thought
    granted, that means there's a chance of a wasted page scan, but I think
    that'd be somewhat of a corner case, so it might not be bad.
    --
    Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
    Pervasive Software http://pervasive.com work: 512-231-6117
    vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
  • Tom Lane at Jul 19, 2006 at 11:39 pm

    "Jim C. Nasby" <jnasby@pervasive.com> writes:
    Could you not just scan right and see what the first key was? Thought
    granted, that means there's a chance of a wasted page scan, but I think
    that'd be somewhat of a corner case, so it might not be bad.
    No, because (a) that confuses the first key that happens to be on a page
    with its keyspace boundary --- what happens when you need to delete that
    data key? and (b) because of locking considerations, you don't want to
    move right and then have to back up. You'd have to hold lock on the
    first page while reading in the second, which makes for a nontrivial
    performance hit.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJul 11, '06 at 2:03p
activeJul 19, '06 at 11:39p
posts7
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase