Hello hackers,

I'm thinking about the btree metapage locking problem.

In the current situation there are three operations that lock the
metapage:
1. when looking for the root page (share lock, "getroot")
2. when setting a new root page (exclusive lock, "setroot")
3. when extending the relation (exclusive lock, "extend").

Now, I want to add three more operations:
4. add a page to the freelist ("addfree")
5. get a page from the freelist ("getfree")
6. shrink a relation ("shrink").

The shrink operation only happens when one tries to add the last page of
the relation to the freelist. I don't know if that's very common, but
in case of relation truncating or massive deletion this is important.


What I want is to be able to do getroot and setroot without being
blocked by any of the other four. Of course the other four are all
mutually exclusive.

There doesn't seem to be a way to acquire two different locks on the
same page, so I propose to lock the InvalidBlockNumer for the btree, and
use that as the lock to be obtained before doing extend, addfree,
getfree or shrink. The setroot/getroot operations would still use the
locking on BTREE_METAPAGE, so they can proceed whether the
InvalidBlockNumber is blocked or not.


On a different topic: the freelist structure I think should be
represented as a freelist int32 number (btm_numfreepages) in
BTMetaPageData, and a pointer to the first BlockNumber. Adding a new
page is done by sequentially scanning the list until a zero is found or
the end of the block is reached. Getting a page sequentially scans the
same list until a blocknumber > 0 is found (iff btm_numfreepages is
greater than zero). This allows for ~ 2000 free pages (very unlikely to
actually happen if the shrink operation is in place).

Comments? Another solution would be to have a separate page for the
freelist, but it seems to be a waste.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Porque francamente, si para saber manejarse a uno mismo hubiera que
rendir examen... ¿Quién es el machito que tendría carnet?" (Mafalda)

Search Discussions

  • Tom Lane at Oct 7, 2002 at 6:31 pm

    Alvaro Herrera writes:
    There doesn't seem to be a way to acquire two different locks on the
    same page, so I propose to lock the InvalidBlockNumer for the btree, and
    use that as the lock to be obtained before doing extend, addfree,
    getfree or shrink. The setroot/getroot operations would still use the
    locking on BTREE_METAPAGE, so they can proceed whether the
    InvalidBlockNumber is blocked or not.
    Unfortunately, blkno == InvalidBlockNumber is already taken --- it's the
    encoding of a relation-level lock. Compare LockRelation and LockPage
    in src/backend/storage/lmgr/lmgr.c.

    It looks to me like nbtree.c is currently using LockPage(rel, 0) to
    interlock relation extension --- this is the same convention used to
    interlock heap extension. But access to the metapage is determined by
    buffer locks, which are an independent facility.

    I agree with your conclusion that extend, shrink, addfree, and getfree
    operations may as well all use the same exclusive lock. I think it
    could be LockPage(rel, 0), though.

    Since addfree/getfree need to touch the metapage, at first glance it
    would seem that they need to do LockPage(rel, 0) *plus* get a buffer
    lock on the metapage. But I think it might work for them to get only
    a shared lock on the metapage; the LockPage lock will guarantee
    exclusion against other addfree/getfree operations, and they don't
    really care if someone else is changing the root pointer. This is ugly
    but given the limited set of operations that occur against a btree
    metapage, it seems acceptable to me. Comments anyone?

    On a different topic: the freelist structure I think should be
    represented as a freelist int32 number (btm_numfreepages) in
    BTMetaPageData, and a pointer to the first BlockNumber. Adding a new
    page is done by sequentially scanning the list until a zero is found or
    the end of the block is reached. Getting a page sequentially scans the
    same list until a blocknumber > 0 is found (iff btm_numfreepages is
    greater than zero). This allows for ~ 2000 free pages (very unlikely to
    actually happen if the shrink operation is in place).
    No more than 2000 free pages in an index? What makes you think that's
    unlikely? It's only 16 meg of free space, which would be about 1% of
    a gigabyte-sized index. I think it's a bad idea to make such a
    hardwired assumption.

    The original concept I believe was to have a linked list of free pages,
    ie, each free page contains the pointer to the next one. This allows
    indefinite expansion. It does mean that you have to read in the
    selected free page to get the next-free-page pointer from it, which you
    have to do to update the metapage, so that read has to happen while
    you're still holding the getfree lock. That's annoying.

    Perhaps we could use a hybrid data structure: up to 2000 free pages
    stored in the metapage, with any remaining ones chained together.
    Most of the time, freelist operations wouldn't need to touch the
    chain, hopefully.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedOct 7, '02 at 5:13p
activeOct 7, '02 at 6:31p
posts2
users2
websitepostgresql.org...
irc#postgresql

2 users in discussion

Alvaro Herrera: 1 post Tom Lane: 1 post

People

Translate

site design / logo © 2021 Grokbase