I have a postgres application, with customers using version 7.4 and the next
version of the product being built on 8.3 (with HOT). We are having problems
with two bloated indexes. One index is on a field whose value never changes.
The other, which bloats more severely, is on a field whose value changes on
every update.

If I have to, I'll do periodic reindexing. Trying to hide that will be somewhat
tricky.

But I'm wondering about an alternative approach. Suppose I have table T and an
index on T, idx1(some_column). Instead of reindexing idx1 how would well would
this work:

- Create a second index, idx2(some_column), using the CONCURRENTLY option.

- When idx2 is ready, drop idx1.

Questions:

- How well does the CONCURRENTLY option work? The documentation on REINDEX says
"An index build with the CONCURRENTLY option failed ..." I'm not sure if I'm
supposed to read this as a comment on the reliability of the CONCURRENTLY option.

- How much of a drag on performance will be caused by the creation of idx2?

- How will the optimizer deal with this approach? When idx2 is ready and idx1 is
dropped, will the optimizer immediately switch over? What about optimization of
any prepared statements on connections whose lifetime spans the creation of idx2?

Jack Orenstein

Search Discussions

  • Tom Lane at Feb 19, 2009 at 9:43 pm

    Jack Orenstein writes:
    - How well does the CONCURRENTLY option work? The documentation on REINDEX says
    "An index build with the CONCURRENTLY option failed ..." I'm not sure if I'm
    supposed to read this as a comment on the reliability of the CONCURRENTLY option.
    No, you aren't. The point there is that CONCURRENTLY commits the
    creation of an index and then some time later the index becomes valid.
    If the system crashes in between you have a useless invalid index hanging
    around. You could manually drop it and try again; REINDEX just provides
    a different approach to fixing the problem.

    AFAIK there aren't any particular reliability issues with CONCURRENTLY.
    - How much of a drag on performance will be caused by the creation of idx2?
    You'd have to measure it in your environment, but if you haven't got
    cycles and I/O to spare it's going to hurt.
    - How will the optimizer deal with this approach? When idx2 is ready and idx1 is
    dropped, will the optimizer immediately switch over? What about optimization of
    any prepared statements on connections whose lifetime spans the creation of idx2?
    Dropping of idx1 will force invalidation of any plans using it.
    Possibly more to the point, DROP INDEX takes exclusive lock on the
    table, so if you have long-running queries that's going to be an issue.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedFeb 19, '09 at 9:02p
activeFeb 19, '09 at 9:43p
posts2
users2
websitepostgresql.org
irc#postgresql

2 users in discussion

Tom Lane: 1 post Jack Orenstein: 1 post

People

Translate

site design / logo © 2021 Grokbase