Hullo, I have pg 8.1.3 on an 8-CPU AIX 5.3 box with 16GB of RAM, and I'm
finding that it's taking an age to CREATE INDEX on a large table:

Column | Type | Modifiers
----------------+------------------------+---------------------------------------------------------------------
ID | integer | not null default nextval(('public.keyword_id_seq'::text)::regclass)
Text | character varying(200) |
Longitude | numeric(16,5) |
Latitude | numeric(16,5) |
AreaID | integer |
SearchCount | integer | not null default 0
Radius | integer |
LanguageID | integer |
KeywordType | character varying(20) |
LowerText | character varying(200) |
NumberOfHotels | integer |
CountryID | integer |
FriendlyText | character varying(200) |
Indexes:


2006-03-29 21:39:38 BST LOG: duration: 41411.625 ms statement: CREATE INDEX ix_keyword_areaid ON "Keyword" USING btree ("AreaID");
2006-03-29 21:42:46 BST LOG: duration: 188550.644 ms statement: CREATE INDEX ix_keyword_lonlat ON "Keyword" USING btree ("Longitude", "Latitude");
2006-03-29 21:46:41 BST LOG: duration: 234864.571 ms statement: CREATE INDEX ix_keyword_lowertext ON "Keyword" USING btree ("LowerText");
2006-03-29 21:52:32 BST LOG: duration: 350757.565 ms statement: CREATE INDEX ix_keyword_type ON "Keyword" USING btree ("KeywordType");

The table has just under six million rows - should it really be taking
nearly six minutes to add an index? These log snippets were taking
during a pg_restore on a newly created db, so there should be no issues
with the table needing vacuuming.

What parameters in the postgresql.conf are pertinent here? I have

shared_buffers 120000
work_mem 16384
maintenance_work_mem = 262144

for starters... any advice would be warmly welcomed!

Cheers,
Gavin.

Search Discussions

  • Tom Lane at Mar 30, 2006 at 3:19 pm

    Gavin Hamill writes:
    The table has just under six million rows - should it really be taking
    nearly six minutes to add an index?
    Try running it with trace_sort enabled to get more info about where the
    time is going.

    We've been doing some considerable work on the sorting code in the last
    couple months, so 8.2 should be better, but I'd like to verify that
    you're not seeing something we don't know about.
    maintenance_work_mem = 262144
    Fooling with this might affect the results some.

    regards, tom lane
  • Gavin Hamill at Mar 30, 2006 at 8:23 pm

    Tom Lane wrote:
    Gavin Hamill <gdh@laterooms.com> writes:

    The table has just under six million rows - should it really be taking
    nearly six minutes to add an index?
    Try running it with trace_sort enabled to get more info about where the
    time is going.

    We've been doing some considerable work on the sorting code in the last
    couple months, so 8.2 should be better, but I'd like to verify that
    you're not seeing something we don't know about.

    OKies, I dropped the db, created again so it's all clean, ran pg_restore
    again with trace_sort on - here's the output from one of the larger
    CREATE INDEXes:

    2006-03-30 16:48:53 BST LOG: begin index sort: unique = f, workMem =
    262144, randomAccess = f
    2006-03-30 16:49:04 BST LOG: switching to external sort: CPU
    0.88s/9.99u sec elapsed 10.90 sec

    2006-03-30 16:49:44 BST LOG: autovacuum: processing database "postgres"
    2006-03-30 16:50:38 BST LOG: performsort starting: CPU 1.69s/102.73u
    sec elapsed 104.58 sec
    2006-03-30 16:50:44 BST LOG: autovacuum: processing database "laterooms"
    2006-03-30 16:51:44 BST LOG: autovacuum: processing database "postgres"
    2006-03-30 16:52:23 BST LOG: finished writing run 1: CPU 2.40s/206.53u
    sec elapsed 209.30 sec
    2006-03-30 16:52:39 BST LOG: finished writing final run 2: CPU
    2.51s/222.98u sec elapsed 225.89 sec
    2006-03-30 16:52:40 BST LOG: performsort done (except final merge): CPU
    2.59s/223.99u sec elapsed 226.98 sec
    2006-03-30 16:52:44 BST LOG: autovacuum: processing database "laterooms"
    2006-03-30 16:52:53 BST LOG: external sort ended, 21292 disk blocks
    used: CPU 3.65s/233.10u sec elapsed 239.35 sec
    2006-03-30 16:52:53 BST LOG: duration: 239381.535 ms statement: CREATE
    INDEX ix_keyword_lowertext ON "Keyword" USING btree ("LowerText");


    During all this, there's been about 900KB/sec of disk activity. The
    disks are RAID1 and will happily sustain 50MB/sec with minimal system
    overhead.

    I'm guessing then that an external sort means disk-based...
    maintenance_work_mem = 262144
    Fooling with this might affect the results some.
    OK will tinker with that - it's not a massive problem since I hope I
    never have to do a pg_restore once the live server is running fulltime :)

    Right - I bumped maintenance_work_mem up to 1GB, tried dropping the
    index and recreating, and sure enough it's an internal sort now,
    chopping 10% off the time taken:

    2006-03-30 21:15:57 BST LOG: begin index sort: unique = f, workMem =
    1048576, randomAccess = f
    2006-03-30 21:16:03 BST LOG: autovacuum: processing database "laterooms"
    2006-03-30 21:16:12 BST LOG: performsort starting: CPU 1.20s/13.85u sec
    elapsed 15.07 sec
    2006-03-30 21:17:03 BST LOG: autovacuum: processing database "laterooms"
    2006-03-30 21:18:03 BST LOG: autovacuum: processing database "laterooms"
    2006-03-30 21:19:03 BST LOG: autovacuum: processing database "laterooms"
    2006-03-30 21:19:28 BST LOG: performsort done: CPU 1.20s/210.34u sec
    elapsed 211.69 sec
    2006-03-30 21:19:36 BST LOG: internal sort ended, 336538 KB used: CPU
    2.06s/212.61u sec elapsed 218.80 sec
    2006-03-30 21:19:36 BST LOG: duration: 218847.055 ms statement: CREATE
    INDEX ix_keyword_lowertext on "Keyword" USING btree ("LowerText");

    If that's reasonable performance from 8.1, then that's fine - I just
    didn't want to be inadvertantly running way under average :)

    Cheers,
    Gavin.
  • Simon Riggs at Mar 30, 2006 at 5:08 pm

    On Thu, 2006-03-30 at 09:26 +0100, Gavin Hamill wrote:
    Hullo, I have pg 8.1.3 on an 8-CPU AIX 5.3 box with 16GB of RAM, and I'm
    finding that it's taking an age to CREATE INDEX on a large table:

    Column | Type | Modifiers
    ----------------+------------------------+---------------------------------------------------------------------
    ID | integer | not null default nextval(('public.keyword_id_seq'::text)::regclass)
    Text | character varying(200) |
    Longitude | numeric(16,5) |
    Latitude | numeric(16,5) |
    AreaID | integer |
    SearchCount | integer | not null default 0
    Radius | integer |
    LanguageID | integer |
    KeywordType | character varying(20) |
    LowerText | character varying(200) |
    NumberOfHotels | integer |
    CountryID | integer |
    FriendlyText | character varying(200) |
    Indexes:


    2006-03-29 21:39:38 BST LOG: duration: 41411.625 ms statement: CREATE INDEX ix_keyword_areaid ON "Keyword" USING btree ("AreaID");
    2006-03-29 21:42:46 BST LOG: duration: 188550.644 ms statement: CREATE INDEX ix_keyword_lonlat ON "Keyword" USING btree ("Longitude", "Latitude");
    2006-03-29 21:46:41 BST LOG: duration: 234864.571 ms statement: CREATE INDEX ix_keyword_lowertext ON "Keyword" USING btree ("LowerText");
    2006-03-29 21:52:32 BST LOG: duration: 350757.565 ms statement: CREATE INDEX ix_keyword_type ON "Keyword" USING btree ("KeywordType");

    The table has just under six million rows - should it really be taking
    nearly six minutes to add an index? These log snippets were taking
    during a pg_restore on a newly created db, so there should be no issues
    with the table needing vacuuming.
    The index build time varies according to the number and type of the
    datatypes, as well as the distribution of values in the table. As well
    as the number of rows in the table.

    Note the x10 factor to index AreaID (integer) v KeywordType (vchar(20))
    What parameters in the postgresql.conf are pertinent here? I have

    shared_buffers 120000
    work_mem 16384
    maintenance_work_mem = 262144
    Try trace_sort = on and then rerun the index builds to see what's
    happening there. We've speeded sort up by about 2.5 times in the current
    development version, but it does just run in single threaded mode so
    your 8 CPUs aren't helping there.

    Looks like you might be just over the maintenance_work_mem limit for the
    last index builds. You can try doubling maintenance_work_mem.

    The extended runtime for KeywordType is interesting in comparison to
    LowerText, which on the face of it is a longer column. My guess would be
    that LowerText is fairly unique and sorts quickly, whereas KeywordType
    is fairly non-unique with a high average row length that require
    complete string comparison before deciding it is actually the same
    value. You might want to try using codes rather than textual
    KeywordTypes.

    You might try using partial indexes also, along the lines of

    CREATE INDEX ix_keyword_type ON "Keyword" USING btree ("KeywordType") WHERE KeywordType IS NOT NULL;

    Best Regards, Simon Riggs
  • Gavin Hamill at Mar 30, 2006 at 8:45 pm

    On Thu, 30 Mar 2006 18:08:44 +0100 Simon Riggs wrote:
    Hello again Simon :)
    The index build time varies according to the number and type of the
    datatypes, as well as the distribution of values in the table. As well
    as the number of rows in the table.

    Note the x10 factor to index AreaID (integer) v KeywordType (vchar(20))
    Fair enough. :) Is there much of a performance increase by using fixed-length character fields instead of varchars?
    Try trace_sort = on and then rerun the index builds to see what's
    happening there. We've speeded sort up by about 2.5 times in the current
    development version, but it does just run in single threaded mode so
    your 8 CPUs aren't helping there.
    Yum - I look forward to the 8.2 release =)
    Looks like you might be just over the maintenance_work_mem limit for the
    last index builds. You can try doubling maintenance_work_mem.
    You were right - needed ~370MB ... I'm happy to alloc 1GB to allow for db growth..
    The extended runtime for KeywordType is interesting in comparison to
    LowerText, which on the face of it is a longer column. My guess would be
    that LowerText is fairly unique and sorts quickly, whereas KeywordType
    is fairly non-unique with a high average row length that require
    complete string comparison before deciding it is actually the same
    value.
    From looking at a few samples of the millions of rows it seems that it's actually KeywordType that's more unique - LowerText is simply an lowercase representation of the name of this search-keyword, so it's much less unique. Fun stuff :)
    You might want to try using codes rather than textual KeywordTypes.
    That makes sense - I can't get a grip on the data in KeywordType at the moment .. many are more obvious like 'RGN' 'AREA' 'MKT' 'LK' for Region, Area, Market and Lake, but many other rows have '1'.
    You might try using partial indexes also, along the lines of

    CREATE INDEX ix_keyword_type ON "Keyword" USING btree ("KeywordType") WHERE KeywordType IS NOT NULL;
    Well, each row does have a KeywordType, so no row has a NULL entry...
    Best Regards, Simon Riggs
    Cheers :)
    Gavin.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedMar 30, '06 at 8:28a
activeMar 30, '06 at 8:45p
posts5
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase