Hackers,

Attached patch enables GIN to store additional information with item
pointers in posting lists and trees.
Such additional information could be positions of words, positions of
trigrams, lengths of arrays and so on.
This is the first and most huge patch of serie of GIN improvements which
was presented at PGConf.EU
http://wiki.postgresql.org/images/2/25/Full-text_search_in_PostgreSQL_in_milliseconds-extended-version.pdf

Patch modifies GIN interface as following:
1) Two arguments are added to extractValue
Datum **addInfo, bool **addInfoIsNull
2) Two arguments are added to consistent
Datum addInfo[], bool addInfoIsNull[]
3) New method config is introduced which returns datatype oid of addtional
information (analogy with SP-GiST config method).

Patch completely changes storage in posting lists and leaf pages of posting
trees. It uses varbyte encoding for BlockNumber and OffsetNumber.
BlockNumber are stored incremental in page. Additionally one bit of
OffsetNumber is reserved for additional information NULL flag. To be able
to find position in leaf data page quickly patch introduces small index in
the end of page.

------
With best regards,
Alexander Korotkov.

Search Discussions

  • Tomas Vondra at Dec 2, 2012 at 1:02 am

    On 18.11.2012 22:54, Alexander Korotkov wrote:
    Hackers,

    Patch completely changes storage in posting lists and leaf pages of
    posting trees. It uses varbyte encoding for BlockNumber and
    OffsetNumber. BlockNumber are stored incremental in page. Additionally
    one bit of OffsetNumber is reserved for additional information NULL
    flag. To be able to find position in leaf data page quickly patch
    introduces small index in the end of page.
    Hi,

    I've tried to apply the patch with the current HEAD, but I'm getting
    segfaults whenever VACUUM runs (either called directly or from autovac
    workers).

    The patch applied cleanly against 9b3ac49e and needed a minor fix when
    applied on HEAD (because of an assert added to ginRedoCreatePTree), but
    that shouldn't be a problem.

    The backtrace always looks like this:

    Program received signal SIGSEGV, Segmentation fault.
    0x00000000004dea3b in processPendingPage (accum=0x7fff15ab8aa0,
    ka=0x7fff15ab8a70, page=0x7f88774a7ea0 "", startoff=1) at ginfast.c:785
    785 addInfo = index_getattr(itup, 2,
    accum->ginstate->tupdesc[curattnum - 1], &addInfoIsNull);
    (gdb) bt
    #0 0x00000000004dea3b in processPendingPage (accum=0x7fff15ab8aa0,
    ka=0x7fff15ab8a70, page=0x7f88774a7ea0 "", startoff=1) at ginfast.c:785
    #1 0x00000000004df3c6 in ginInsertCleanup (ginstate=0x7fff15ab97c0,
    vac_delay=1 '\001', stats=0xfb0050) at ginfast.c:909
    #2 0x00000000004dbe8c in ginbulkdelete (fcinfo=0x7fff15abbfb0) at
    ginvacuum.c:747



    Reproducing the issue is quite simple:

    1) create table messages (id int, txt text, ts tsvector);
    2) insert into messages select i, substr(md5(i::text), 0, 4),
                   to_tsvector('english', substr(md5(i::text), 0, 4))
                   from generate_series(1,100000) s(i);
    3) vacuum messages
    4) ... segfault :-(

    regards
    Tomas
  • Robert Haas at Dec 4, 2012 at 5:34 pm

    On Sun, Nov 18, 2012 at 4:54 PM, Alexander Korotkov wrote:
    Patch completely changes storage in posting lists and leaf pages of posting
    trees. It uses varbyte encoding for BlockNumber and OffsetNumber.
    BlockNumber are stored incremental in page. Additionally one bit of
    OffsetNumber is reserved for additional information NULL flag. To be able to
    find position in leaf data page quickly patch introduces small index in the
    end of page.
    This sounds like it means that this would break pg_upgrade, about
    which I'm not too keen. Ideally, we'd like to have a situation where
    new indexes have additional capabilities, but old indexes are still
    usable for things that they could do before. I am not sure whether
    that's a realistic goal.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Josh Berkus at Dec 4, 2012 at 6:04 pm

    On 12/4/12 9:34 AM, Robert Haas wrote:
    On Sun, Nov 18, 2012 at 4:54 PM, Alexander Korotkov
    wrote:
    Patch completely changes storage in posting lists and leaf pages of posting
    trees. It uses varbyte encoding for BlockNumber and OffsetNumber.
    BlockNumber are stored incremental in page. Additionally one bit of
    OffsetNumber is reserved for additional information NULL flag. To be able to
    find position in leaf data page quickly patch introduces small index in the
    end of page.
    This sounds like it means that this would break pg_upgrade, about
    which I'm not too keen. Ideally, we'd like to have a situation where
    new indexes have additional capabilities, but old indexes are still
    usable for things that they could do before. I am not sure whether
    that's a realistic goal.
    Is there a reason not to create this as a new type of index? "GIN2" or
    whatever?


    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Andres Freund at Dec 4, 2012 at 6:05 pm

    On 2012-12-04 10:04:03 -0800, Josh Berkus wrote:
    On 12/4/12 9:34 AM, Robert Haas wrote:
    On Sun, Nov 18, 2012 at 4:54 PM, Alexander Korotkov
    wrote:
    Patch completely changes storage in posting lists and leaf pages of posting
    trees. It uses varbyte encoding for BlockNumber and OffsetNumber.
    BlockNumber are stored incremental in page. Additionally one bit of
    OffsetNumber is reserved for additional information NULL flag. To be able to
    find position in leaf data page quickly patch introduces small index in the
    end of page.
    This sounds like it means that this would break pg_upgrade, about
    which I'm not too keen. Ideally, we'd like to have a situation where
    new indexes have additional capabilities, but old indexes are still
    usable for things that they could do before. I am not sure whether
    that's a realistic goal.
    Is there a reason not to create this as a new type of index? "GIN2" or
    whatever?
    Aren't the obvious maintenance problems enough?

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Alexander Korotkov at Dec 4, 2012 at 7:12 pm
    Hi!
    On Sun, Dec 2, 2012 at 5:02 AM, Tomas Vondra wrote:

    I've tried to apply the patch with the current HEAD, but I'm getting
    segfaults whenever VACUUM runs (either called directly or from autovac
    workers).

    The patch applied cleanly against 9b3ac49e and needed a minor fix when
    applied on HEAD (because of an assert added to ginRedoCreatePTree), but
    that shouldn't be a problem.
    Thanks for testing! Patch is rebased with HEAD. The bug you reported was
    fixed.

    ------
    With best regards,
    Alexander Korotkov.
  • Alexander Korotkov at Dec 4, 2012 at 7:30 pm

    On Tue, Dec 4, 2012 at 9:34 PM, Robert Haas wrote:

    On Sun, Nov 18, 2012 at 4:54 PM, Alexander Korotkov
    wrote:
    Patch completely changes storage in posting lists and leaf pages of posting
    trees. It uses varbyte encoding for BlockNumber and OffsetNumber.
    BlockNumber are stored incremental in page. Additionally one bit of
    OffsetNumber is reserved for additional information NULL flag. To be able to
    find position in leaf data page quickly patch introduces small index in the
    end of page.
    This sounds like it means that this would break pg_upgrade, about
    which I'm not too keen. Ideally, we'd like to have a situation where
    new indexes have additional capabilities, but old indexes are still
    usable for things that they could do before. I am not sure whether
    that's a realistic goal.

    This means to have two versions of code which deals with posting trees and
    lists. For me it seems unlikely we have resources for maintenance of this.

    ------
    With best regards,
    Alexander Korotkov.
  • Alvaro Herrera at Dec 4, 2012 at 8:35 pm

    Alexander Korotkov escribió:
    On Tue, Dec 4, 2012 at 9:34 PM, Robert Haas wrote:

    On Sun, Nov 18, 2012 at 4:54 PM, Alexander Korotkov
    wrote:
    Patch completely changes storage in posting lists and leaf pages of posting
    trees. It uses varbyte encoding for BlockNumber and OffsetNumber.
    BlockNumber are stored incremental in page. Additionally one bit of
    OffsetNumber is reserved for additional information NULL flag. To be able to
    find position in leaf data page quickly patch introduces small index in the
    end of page.
    This sounds like it means that this would break pg_upgrade, about
    which I'm not too keen. Ideally, we'd like to have a situation where
    new indexes have additional capabilities, but old indexes are still
    usable for things that they could do before. I am not sure whether
    that's a realistic goal.
    This means to have two versions of code which deals with posting trees and
    lists. For me it seems unlikely we have resources for maintenance of this.
    Witness how GIN has gone with unfixed bugs for months, even though
    patches to fix them have been posted. We don't have the manpower to
    maintain even *one* such implementation, let alone two.

    Maybe we can mark GIN indexes as invalid after pg_upgrade somehow, so
    that they require reindex in the new cluster before they can be used for
    queries or index updates.

    --
    Álvaro Herrera http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Bruce Momjian at Dec 4, 2012 at 8:38 pm

    On Tue, Dec 4, 2012 at 05:35:24PM -0300, Alvaro Herrera wrote:
    This means to have two versions of code which deals with posting trees and
    lists. For me it seems unlikely we have resources for maintenance of this.
    Witness how GIN has gone with unfixed bugs for months, even though
    patches to fix them have been posted. We don't have the manpower to
    maintain even *one* such implementation, let alone two.

    Maybe we can mark GIN indexes as invalid after pg_upgrade somehow, so
    that they require reindex in the new cluster before they can be used for
    queries or index updates.
    Yes, pg_upgrade has infrastructure to do that.

    --
       Bruce Momjian <bruce@momjian.us> http://momjian.us
       EnterpriseDB http://enterprisedb.com

       + It's impossible for everything to be true. +
  • Tomas Vondra at Dec 4, 2012 at 9:56 pm

    On 4.12.2012 20:12, Alexander Korotkov wrote:
    Hi!

    On Sun, Dec 2, 2012 at 5:02 AM, Tomas Vondra wrote:

    I've tried to apply the patch with the current HEAD, but I'm getting
    segfaults whenever VACUUM runs (either called directly or from autovac
    workers).

    The patch applied cleanly against 9b3ac49e and needed a minor fix when
    applied on HEAD (because of an assert added to ginRedoCreatePTree), but
    that shouldn't be a problem.


    Thanks for testing! Patch is rebased with HEAD. The bug you reported was
    fixed.
    Applies fine, but I get a segfault in dataPlaceToPage at gindatapage.c.
    The whole backtrace is here: http://pastebin.com/YEPuWeuV

    The messages written into PostgreSQL log are quite variable - usually it
    looks like this:

    2012-12-04 22:31:08 CET 31839 LOG: database system was not properly
    shut down; automatic recovery in progress
    2012-12-04 22:31:08 CET 31839 LOG: redo starts at 0/68A76E48
    2012-12-04 22:31:08 CET 31839 LOG: unexpected pageaddr 0/1BE64000 in
    log segment 000000010000000000000069, offset 15089664
    2012-12-04 22:31:08 CET 31839 LOG: redo done at 0/69E63638

    but I've seen this message too

    2012-12-04 22:20:29 CET 31709 LOG: database system was not properly
    shut down; automatic recovery in progress
    2012-12-04 22:20:29 CET 31709 LOG: redo starts at 0/AEAFAF8
    2012-12-04 22:20:29 CET 31709 LOG: record with zero length at 0/C7D5698
    2012-12-04 22:20:29 CET 31709 LOG: redo done at 0/C7D55E


    I wasn't able to prepare a simple testcase to reproduce this, so I've
    attached two files from my "fun project" where I noticed it. It's a
    simple DB + a bit of Python for indexing mbox archives inside Pg.

    - create.sql - a database structure with a bunch of GIN indexes on
                    tsvector columns on "messages" table

    - load.py - script for parsing mbox archives / loading them into the
                 "messages" table (warning: it's a bit messy)


    Usage:

    1) create the DB structure
    $ createdb archives
    $ psql archives < create.sql

    2) fetch some archives (I consistently get SIGSEGV after first three)
    $ wget
    http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-01.gz
    $ wget
    http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-02.gz
    $ wget
    http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-03.gz

    3) gunzip and load them using the python script
    $ gunzip pgsql-hackers.*.gz
    $ ./load.py --db archives pgsql-hackers.*

    4) et voila - a SIGSEGV :-(


    I suspect this might be related to the fact that the load.py script uses
    savepoints quite heavily to handle UNIQUE_VIOLATION (duplicate messages).


    Tomas
  • Tom Lane at Dec 4, 2012 at 10:35 pm

    Alvaro Herrera writes:
    Maybe we can mark GIN indexes as invalid after pg_upgrade somehow, so
    that they require reindex in the new cluster before they can be used for
    queries or index updates.
    Bumping the version number in the GIN metapage would be sufficient.

        regards, tom lane
  • Bruce Momjian at Dec 4, 2012 at 10:47 pm

    On Tue, Dec 4, 2012 at 05:35:27PM -0500, Tom Lane wrote:
    Alvaro Herrera <alvherre@2ndquadrant.com> writes:
    Maybe we can mark GIN indexes as invalid after pg_upgrade somehow, so
    that they require reindex in the new cluster before they can be used for
    queries or index updates.
    Bumping the version number in the GIN metapage would be sufficient.
    And it is easy for pg_upgrade to report which indexes need rebuilding,
    and it can create a script file to do the reindexing.

    --
       Bruce Momjian <bruce@momjian.us> http://momjian.us
       EnterpriseDB http://enterprisedb.com

       + It's impossible for everything to be true. +
  • Alexander Korotkov at Dec 5, 2012 at 8:11 am

    On Wed, Dec 5, 2012 at 1:56 AM, Tomas Vondra wrote:
    On 4.12.2012 20:12, Alexander Korotkov wrote:
    Hi!

    On Sun, Dec 2, 2012 at 5:02 AM, Tomas Vondra <tv@fuzzy.cz
    wrote:

    I've tried to apply the patch with the current HEAD, but I'm getting
    segfaults whenever VACUUM runs (either called directly or from autovac
    workers).

    The patch applied cleanly against 9b3ac49e and needed a minor fix when
    applied on HEAD (because of an assert added to ginRedoCreatePTree), but
    that shouldn't be a problem.


    Thanks for testing! Patch is rebased with HEAD. The bug you reported was
    fixed.
    Applies fine, but I get a segfault in dataPlaceToPage at gindatapage.c.
    The whole backtrace is here: http://pastebin.com/YEPuWeuV

    The messages written into PostgreSQL log are quite variable - usually it
    looks like this:

    2012-12-04 22:31:08 CET 31839 LOG: database system was not properly
    shut down; automatic recovery in progress
    2012-12-04 22:31:08 CET 31839 LOG: redo starts at 0/68A76E48
    2012-12-04 22:31:08 CET 31839 LOG: unexpected pageaddr 0/1BE64000 in
    log segment 000000010000000000000069, offset 15089664
    2012-12-04 22:31:08 CET 31839 LOG: redo done at 0/69E63638

    but I've seen this message too

    2012-12-04 22:20:29 CET 31709 LOG: database system was not properly
    shut down; automatic recovery in progress
    2012-12-04 22:20:29 CET 31709 LOG: redo starts at 0/AEAFAF8
    2012-12-04 22:20:29 CET 31709 LOG: record with zero length at 0/C7D5698
    2012-12-04 22:20:29 CET 31709 LOG: redo done at 0/C7D55E


    I wasn't able to prepare a simple testcase to reproduce this, so I've
    attached two files from my "fun project" where I noticed it. It's a
    simple DB + a bit of Python for indexing mbox archives inside Pg.

    - create.sql - a database structure with a bunch of GIN indexes on
    tsvector columns on "messages" table

    - load.py - script for parsing mbox archives / loading them into the
    "messages" table (warning: it's a bit messy)


    Usage:

    1) create the DB structure
    $ createdb archives
    $ psql archives < create.sql

    2) fetch some archives (I consistently get SIGSEGV after first three)
    $ wget
    http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-01.gz
    $ wget
    http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-02.gz
    $ wget
    http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-03.gz

    3) gunzip and load them using the python script
    $ gunzip pgsql-hackers.*.gz
    $ ./load.py --db archives pgsql-hackers.*

    4) et voila - a SIGSEGV :-(


    I suspect this might be related to the fact that the load.py script uses
    savepoints quite heavily to handle UNIQUE_VIOLATION (duplicate messages).
    Thanks for bug report. It is fixed in the attached patch.

    ------
    With best regards,
    Alexander Korotkov.
  • Tomas Vondra at Dec 6, 2012 at 1:45 am

    On 5.12.2012 09:10, Alexander Korotkov wrote:
    On Wed, Dec 5, 2012 at 1:56 AM, Tomas Vondra wrote:

    Thanks for bug report. It is fixed in the attached patch.
    Hi,

    I gave it another try and this time it went fine - I didn't get any
    segfault when loading the data, which is a good news.

    Then I've run a simple benchmarking script, and the results are not as
    good as I expected, actually I'm getting much worse performance than
    with the original GIN index.

    The following table contains the time of loading the data (not a big
    difference), and number of queries per minute for various number of
    words in the query.

    The queries looks like this

    SELECT id FROM messages
      WHERE body_tsvector @@ plainto_tsquery('english', 'word1 word2 ...')

    so it's really the simplest form of FTS query possible.

                without patch | with patch
    --------------------------------------------
    loading 750 sec | 770 sec
    1 word 1500 | 1100
    2 words 23000 | 9800
    3 words 24000 | 9700
    4 words 16000 | 7200
    --------------------------------------------

    I'm not saying this is a perfect benchmark, but the differences (of
    querying) are pretty huge. Not sure where this difference comes from,
    but it seems to be quite consistent (I usually get +-10% results, which
    is negligible considering the huge difference).

    Is this an expected behaviour that will be fixed by another patch?

    The database contains ~680k messages from the mailing list archives,
    i.e. about 900 MB of data (in the table), and the GIN index on tsvector
    is about 900MB too. So the whole dataset nicely fits into memory (8GB
    RAM), and it seems to be completely CPU bound (no I/O activity at all).

    The configuration was exactly the same in both cases

         shared buffers = 1GB
         work mem = 64 MB
         maintenance work mem = 256 MB

    I can either upload the database somewhere, or provide the benchmarking
    script if needed.

    Tomas
  • Alexander Korotkov at Dec 22, 2012 at 4:16 pm
    Hi!
    On Thu, Dec 6, 2012 at 5:44 AM, Tomas Vondra wrote:

    Then I've run a simple benchmarking script, and the results are not as
    good as I expected, actually I'm getting much worse performance than
    with the original GIN index.

    The following table contains the time of loading the data (not a big
    difference), and number of queries per minute for various number of
    words in the query.

    The queries looks like this

    SELECT id FROM messages
    WHERE body_tsvector @@ plainto_tsquery('english', 'word1 word2 ...')

    so it's really the simplest form of FTS query possible.

    without patch | with patch
    --------------------------------------------
    loading 750 sec | 770 sec
    1 word 1500 | 1100
    2 words 23000 | 9800
    3 words 24000 | 9700
    4 words 16000 | 7200
    --------------------------------------------

    I'm not saying this is a perfect benchmark, but the differences (of
    querying) are pretty huge. Not sure where this difference comes from,
    but it seems to be quite consistent (I usually get +-10% results, which
    is negligible considering the huge difference).

    Is this an expected behaviour that will be fixed by another patch?
    Another patches which significantly accelerate index search will be
    provided. This patch changes only GIN posting lists/trees storage. However,
    it wasn't expected that this patch significantly changes index scan speed
    in any direction.

    The database contains ~680k messages from the mailing list archives,
    i.e. about 900 MB of data (in the table), and the GIN index on tsvector
    is about 900MB too. So the whole dataset nicely fits into memory (8GB
    RAM), and it seems to be completely CPU bound (no I/O activity at all).

    The configuration was exactly the same in both cases

    shared buffers = 1GB
    work mem = 64 MB
    maintenance work mem = 256 MB

    I can either upload the database somewhere, or provide the benchmarking
    script if needed.

    Unfortunately, I can't reproduce such huge slowdown on my testcases. Could
    you share both database and benchmarking script?

    ------
    With best regards,
    Alexander Korotkov.
  • Tomas Vondra at Dec 23, 2012 at 11:48 pm
    Hi!
    On 22.12.2012 17:15, Alexander Korotkov wrote:
    I'm not saying this is a perfect benchmark, but the differences (of
    querying) are pretty huge. Not sure where this difference comes from,
    but it seems to be quite consistent (I usually get +-10% results, which
    is negligible considering the huge difference).

    Is this an expected behaviour that will be fixed by another patch?


    Another patches which significantly accelerate index search will be
    provided. This patch changes only GIN posting lists/trees storage.
    However, it wasn't expected that this patch significantly changes index
    scan speed in any direction.
    That was exactly my expectation - probably not an improvement, but
    definitely not a worse performance.
    The database contains ~680k messages from the mailing list archives,
    i.e. about 900 MB of data (in the table), and the GIN index on tsvector
    is about 900MB too. So the whole dataset nicely fits into memory (8GB
    RAM), and it seems to be completely CPU bound (no I/O activity at all).

    The configuration was exactly the same in both cases

    shared buffers = 1GB
    work mem = 64 MB
    maintenance work mem = 256 MB

    I can either upload the database somewhere, or provide the benchmarking
    script if needed.


    Unfortunately, I can't reproduce such huge slowdown on my testcases.
    Could you share both database and benchmarking script?
    It's strange, but no matter what I do I can't reproduce those results
    (with the significant performance decrease). So either I've done some
    strange mistake when running those tests, or there was something wrong
    with my system, or whatever :-(

    But when running the benchmarks now (double-checked everything, properly
    repeated the tests, ...), I've noticed a different behaviour. But first
    some info about the scripts I use for testing.

    All the scripts are available here:

       https://bitbucket.org/tvondra/archie

    It's my "hobby project" implementing fulltext mbox archive. It should be
    usable but it's still a bit WIP so let me know in case of any issues.

    The README should give you all the instructions on how to setup and load
    the database. I'm using ~1700 mbox files downloaded from
    http://archives.postgresql.org/ for these lists (until 2012/11):

        pgadmin-hackers
        pgsql-advocacy
        pgsql-announce
        pgsql-bugs
        pgsql-general
        pgsql-hackers
        pgsql-jdbc
        pgsql-novice
        pgsql-odbc
        pgsql-patches
        pgsql-sql

    which in the end gives ~677k rows in the 'messages' table, occupying
    ~5.5GB disk space (including all the indexes etc).

    Once you have the data loaded, you need to warmup the database and then
    start benchmarking it - I'm using the warmup.py script to both things.
    The script is quite simple, it basically just

    To warmup the DB, just run this

         ./warmup.py --db archie --duration 300

    until the %util drops near 0 (assuming you have enough RAM to fit the
    whole database into memory). Then I usually do this as a benchmarking

         ./warmup.py --db archie --duration 300 --no-hash \
                                 --no-thread --words 1

         ./warmup.py --db archie --duration 300 --no-hash \
                                 --no-thread --words 2

    which runs 60-second tests and outputs one line for worker (by default
    equal to the number of CPUs).

    The script itself is very simple, it fetches a random message and uses
    the tsvector column as a source of words for the actual benchmark. It
    takes N words from the tsvector, splits them into groups and performs a
    simple fulltext query using plainto_tsquery('word1 word2 ...'). At the
    end it prints info including the number of queries per second.

    I've run the tests on the current master with and without the v3 patch.
    I've tested it with 1GB or 2GB shared buffers, and 32MB or 64MB work mem.

    The tests were run for 1, 2, 3, 4 and 5 words, and I've repeated it five
    times for each configuration. Duration of each run was 5-minutes.

    These are the averages (from the 5 runs) of queries per second for each
    combination of parameters:

                               1 2 3 4 5
    ----------------------------------------------------
      master 1GB/32MB 19 179 165 127 99
      patched 1GB/32MB 19 175 163 124 96

      master 1GB/64MB 20 181 165 127 99
      patched 1GB/64MB 19 174 159 120 95

      master 2GB/32MB 27 181 165 127 98
      patched 2GB/32MB 25 176 156 120 93

      master 2GB/64MB 27 180 166 128 102
      patched 2GB/64MB 40 402 364 245 176

    There's no significant difference in performance, except for the
    2GB/64MB combination. And in that case it's actually the opposite
    direction than I've reported before - i.e. this time it's up to 100%
    faster than the unpatched master. The results are pretty consistent
    (very small variance across the repeated runs), so I'm not sure about
    the previous results.

    Any idea what might cause such behavior? Why should it happen only with
    this particular combination of shared_buffers and work_mem?

    kind regards
    Tomas
  • Craig Ringer at Mar 3, 2013 at 2:53 pm
    The GIN changes don't seem to have progressed in some time, and some of
    the most recent activity
    (http://archives.postgresql.org/message-id/50BFF89A.7080908@fuzzy.cz)
    suggests unconvincing test results.

    Is this work considered to be a dead-end - a good idea that didn't work
    out in practice? Or do you think it still has merit and can be made
    useful and ready for inclusion?

    Given the activity level I would like to bounce this patch, either as
    "returned with feedback" if you want to take another go at it post-9.3,
    or as "rejected" if you think the idea won't go anywhere. Please let me
    know how you think it looks.

    --
      Craig Ringer http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Alexander Korotkov at Mar 3, 2013 at 5:30 pm

    On Sun, Mar 3, 2013 at 6:53 PM, Craig Ringer wrote:

    The GIN changes don't seem to have progressed in some time, and some of
    the most recent activity
    (http://archives.postgresql.org/message-id/50BFF89A.7080908@fuzzy.cz)
    suggests unconvincing test results.
    Actually, _most_ recent acitivity showing inverse
    http://www.postgresql.org/message-id/50D79861.3030303@fuzzy.cz
    However, this patch itself is not expected to give significant changes in
    search speed. Testing results with both double acceleration and slowdown
    looks strange for me. I can't either reproduce it or explain.

    Is this work considered to be a dead-end - a good idea that didn't work
    out in practice? Or do you think it still has merit and can be made
    useful and ready for inclusion?
    This patch is only first of future serie of GIN improvements patches. It
    doesn't change anything significant in search, only in storage. This time
    we are working on design of rest of patches in order to put them on the
    consideration. This lead to lack of attention to this patch.

    Given the activity level I would like to bounce this patch, either as
    "returned with feedback" if you want to take another go at it post-9.3,
    or as "rejected" if you think the idea won't go anywhere. Please let me
    know how you think it looks.
    "Returned with feedback", definitely.

    ------
    With best regards,
    Alexander Korotkov.
  • Craig Ringer at Mar 4, 2013 at 2:54 am
    On 03/04/2013 01:29 AM, Alexander Korotkov wrote:

         Given the activity level I would like to bounce this patch, either as
         "returned with feedback" if you want to take another go at it post-9.3,
         or as "rejected" if you think the idea won't go anywhere. Please let me
         know how you think it looks.


    "Returned with feedback", definitely.

    Done, and thankyou for taking the time to explain and write such a clear
    response that'll be useful if others have reason to look into the same
    area later.

    --
      Craig Ringer http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedNov 18, '12 at 9:55p
activeMar 4, '13 at 2:54a
posts19
users9
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase