In some of our applications, we have cases where it would be very nice
if we could activate TOAST at some sort of lower threshold than the
usual 2K that is true now. Let me note the current code that controls
the threshold:

/*
* These symbols control toaster activation. If a tuple is larger than
* TOAST_TUPLE_THRESHOLD, we will try to toast it down to no more than
* TOAST_TUPLE_TARGET bytes. Both numbers include all tuple header overhead
* and between-fields alignment padding, but we do *not* consider any
* end-of-tuple alignment padding; hence the values can be compared directly
* to a tuple's t_len field. We choose TOAST_TUPLE_THRESHOLD with the
* knowledge that toast-table tuples will be exactly that size, and we'd
* like to fit four of them per page with minimal space wastage.
*
* The numbers need not be the same, though they currently are.
*
* Note: sizeof(PageHeaderData) includes the first ItemId, but we have
* to allow for 3 more, if we want to fit 4 tuples on a page.
*/
#define TOAST_TUPLE_THRESHOLD^I\
^IMAXALIGN_DOWN((BLCKSZ - \
^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
^I^I^I^I / 4)

We have cases where we're storing XML message information which is
near the 0.5K mark, that being the case, tuples virtually never get
TOASTed.

somesystem=# select min(length(xml)), max(length(xml)), avg(length(xml)), stddev(length(xml)) from table_with_xml;
min | max | avg | stddev
-----+------+----------------------+------------------
244 | 2883 | 651.6900720788174376 | 191.602077911138
(1 row)

I can see four controls as being pretty plausible:

1. Compile time...

#define TOAST_DENOMINATOR 17
/* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */

#define TOAST_TUPLE_THRESHOLD^I\
^IMAXALIGN_DOWN((BLCKSZ - \
^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
^I^I^I^I / TOAST_DENOMINATOR)

That's obviously cheapest to the DB engine.

I just made this change to a checkout of CVS HEAD, and it readily
survived a regression test.

2. GUC value for TOAST_DENOMINATOR

Do the above, but with the added detail that TOAST_DENOMINATOR refers
to a GUC value.

I think I could probably make this change; the principle remains much
the same as with #1.

3. GUC value for TOAST_TUPLE_THRESHOLD

This probably has to get modified to the nearest feasible value,
modulo alignment; it's not all that different from #1 or #2.

4. A different mechanism would be to add a fifth storage column
strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
say, TOAST.

At present, the 4 values are essentially advisory; columns get TOASTed
if the column permits EXTENDED storage, but that only occurs if the
size is greater than TOAST_TUPLE_THRESHOLD.

If the new value was chosen, the column would *always* get stored as
TOAST.

Presumably #1 or #2 could readily get into 8.3 as they're pretty easy;
#3 is a bit trickier, whilst #4 is probably not "8.3-fittable".

Question:

Which of these sounds preferable?
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/finances.html
Where do you *not* want to go today? "Confutatis maledictis, flammis
acribus addictis" (<http://www.hex.net/~cbbrowne/msprobs.html>

Search Discussions

  • Tom Lane at Mar 21, 2007 at 6:05 pm

    Chris Browne writes:
    #define TOAST_DENOMINATOR 17
    /* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */
    #define TOAST_TUPLE_THRESHOLD^I\
    ^IMAXALIGN_DOWN((BLCKSZ - \
    ^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
    ^I^I^I^I / TOAST_DENOMINATOR)
    Given that you are quoting code that was demonstrably broken since the
    original coding of TOAST up till a month or two back, "it passes
    regression" is not adequate proof of "it's right". In fact I think
    it's not right; you have not got the roundoff condition straight.
    4. A different mechanism would be to add a fifth storage column
    strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
    say, TOAST.
    Anything along this line would require invoking the toaster on every
    single tuple, since we'd always have to crawl through all the columns
    to see if toasting was supposed to happen. No thanks.
    Which of these sounds preferable?
    It's a bit late in the cycle to be proposing any of these for 8.3.

    regards, tom lane
  • Jan Wieck at Mar 21, 2007 at 7:48 pm

    On 3/21/2007 2:05 PM, Tom Lane wrote:
    Chris Browne <cbbrowne@acm.org> writes:
    #define TOAST_DENOMINATOR 17
    /* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */
    #define TOAST_TUPLE_THRESHOLD^I\
    ^IMAXALIGN_DOWN((BLCKSZ - \
    ^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
    ^I^I^I^I / TOAST_DENOMINATOR)
    Given that you are quoting code that was demonstrably broken since the
    original coding of TOAST up till a month or two back, "it passes
    regression" is not adequate proof of "it's right". In fact I think
    it's not right; you have not got the roundoff condition straight.
    4. A different mechanism would be to add a fifth storage column
    strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
    say, TOAST.
    FORCE_COMPRESSION, FORCE_EXTERNAL and FORCE_EXTERNAL_UNCOMPRESSED.
    Anything along this line would require invoking the toaster on every
    single tuple, since we'd always have to crawl through all the columns
    to see if toasting was supposed to happen. No thanks.
    Not necessarily. A flag in Relation telling if the table has any column
    marked like that could be set while constructing the relcache entry.
    Which of these sounds preferable?
    It's a bit late in the cycle to be proposing any of these for 8.3.
    Certainly.


    Jan

    --
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me. #
    #================================================== JanWieck@Yahoo.com #
  • Bruce Momjian at Mar 21, 2007 at 8:12 pm
    Is this a TODO?

    ---------------------------------------------------------------------------

    Jan Wieck wrote:
    On 3/21/2007 2:05 PM, Tom Lane wrote:
    Chris Browne <cbbrowne@acm.org> writes:
    #define TOAST_DENOMINATOR 17
    /* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */
    #define TOAST_TUPLE_THRESHOLD^I\
    ^IMAXALIGN_DOWN((BLCKSZ - \
    ^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
    ^I^I^I^I / TOAST_DENOMINATOR)
    Given that you are quoting code that was demonstrably broken since the
    original coding of TOAST up till a month or two back, "it passes
    regression" is not adequate proof of "it's right". In fact I think
    it's not right; you have not got the roundoff condition straight.
    4. A different mechanism would be to add a fifth storage column
    strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
    say, TOAST.
    FORCE_COMPRESSION, FORCE_EXTERNAL and FORCE_EXTERNAL_UNCOMPRESSED.
    Anything along this line would require invoking the toaster on every
    single tuple, since we'd always have to crawl through all the columns
    to see if toasting was supposed to happen. No thanks.
    Not necessarily. A flag in Relation telling if the table has any column
    marked like that could be set while constructing the relcache entry.
    Which of these sounds preferable?
    It's a bit late in the cycle to be proposing any of these for 8.3.
    Certainly.


    Jan

    --
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me. #
    #================================================== JanWieck@Yahoo.com #

    ---------------------------(end of broadcast)---------------------------
    TIP 2: Don't 'kill -9' the postmaster
    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://www.enterprisedb.com

    + If your life is a hard drive, Christ can be your backup. +
  • Luke Lonergan at Mar 22, 2007 at 3:10 pm
    I advocate the following:

    - Enable specification of TOAST policy on a per column basis

    As a first step, then:

    - Enable vertical partitioning of tables using per-column specification of
    storage policy.

    - Luke

    On 3/21/07 1:12 PM, "Bruce Momjian" wrote:


    Is this a TODO?

    ---------------------------------------------------------------------------

    Jan Wieck wrote:
    On 3/21/2007 2:05 PM, Tom Lane wrote:
    Chris Browne <cbbrowne@acm.org> writes:
    #define TOAST_DENOMINATOR 17
    /* Use this as the divisor; current default behaviour falls from
    TOAST_DENOMINATOR = 4 */
    #define TOAST_TUPLE_THRESHOLD^I\
    ^IMAXALIGN_DOWN((BLCKSZ - \
    ^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
    ^I^I^I^I / TOAST_DENOMINATOR)
    Given that you are quoting code that was demonstrably broken since the
    original coding of TOAST up till a month or two back, "it passes
    regression" is not adequate proof of "it's right". In fact I think
    it's not right; you have not got the roundoff condition straight.
    4. A different mechanism would be to add a fifth storage column
    strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
    say, TOAST.
    FORCE_COMPRESSION, FORCE_EXTERNAL and FORCE_EXTERNAL_UNCOMPRESSED.
    Anything along this line would require invoking the toaster on every
    single tuple, since we'd always have to crawl through all the columns
    to see if toasting was supposed to happen. No thanks.
    Not necessarily. A flag in Relation telling if the table has any column
    marked like that could be set while constructing the relcache entry.
    Which of these sounds preferable?
    It's a bit late in the cycle to be proposing any of these for 8.3.
    Certainly.


    Jan

    --
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me. #
    #================================================== JanWieck@Yahoo.com #

    ---------------------------(end of broadcast)---------------------------
    TIP 2: Don't 'kill -9' the postmaster
  • Andreas Pflug at Mar 22, 2007 at 4:42 pm

    Luke Lonergan wrote:
    I advocate the following:

    - Enable specification of TOAST policy on a per column basis

    As a first step, then:

    - Enable vertical partitioning of tables using per-column specification of
    storage policy.
    Wouldn't it be enough to enable having the toast table on a different
    table space?

    Regards,
    Andreas
  • Luke Lonergan at Mar 22, 2007 at 5:19 pm
    Andreas,
    On 3/22/07 9:40 AM, "Andreas Pflug" wrote:

    Wouldn't it be enough to enable having the toast table on a different
    table space?
    Yes, but the ultimate goal would allow the allocation of a storage mechanism
    that is unique to each column. The most frequently used mechanism for our
    customers would likely be to have each column associated with it's own
    internal relation (like a separate TOAST table), which puts each column into
    it's own dense page storage.

    Beside the advantages of separating out keys columns from data columns,
    compression and encryption approaches that use column packed data are much
    more effective.

    As Tom points out there are complications WRT update, access, etc that need
    to be worked out to support this, but it's an important capability to have
    IMO.

    - Luke
  • Chris Browne at Mar 22, 2007 at 8:04 pm

    "Luke Lonergan" writes:
    Andreas,
    On 3/22/07 9:40 AM, "Andreas Pflug" wrote:

    Wouldn't it be enough to enable having the toast table on a different
    table space?
    Yes, but the ultimate goal would allow the allocation of a storage mechanism
    that is unique to each column. The most frequently used mechanism for our
    customers would likely be to have each column associated with it's own
    internal relation (like a separate TOAST table), which puts each column into
    it's own dense page storage.

    Beside the advantages of separating out keys columns from data columns,
    compression and encryption approaches that use column packed data are much
    more effective.

    As Tom points out there are complications WRT update, access, etc that need
    to be worked out to support this, but it's an important capability to have
    IMO.
    Hmm. Are you trying to do something sort of like CStore?

    http://db.csail.mit.edu/projects/cstore/

    That seems to have some relevant ideas...
    --
    let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
    http://cbbrowne.com/info/finances.html
    Where do you *not* want to go today? "Confutatis maledictis, flammis
    acribus addictis" (<http://www.hex.net/~cbbrowne/msprobs.html>
  • Luke Lonergan at Mar 22, 2007 at 9:56 pm
    Chris,
    Hmm. Are you trying to do something sort of like CStore?

    http://db.csail.mit.edu/projects/cstore/

    That seems to have some relevant ideas...
    I think "something like" is a good way to put it. As you know Cstore was a
    prototype for Vertica and these are in the same class as SybaseIQ and
    SandDB.

    The huge drawback of the pure column approach is update/insert while query
    is difficult if not impossible. I think there are hybrid approaches that
    yield most, if not all of the benefits of the column store approach without
    the disadvantages.

    For instance, a bitmap index with index only access in a row-store may
    outperform the column store on queries. Note the "index only access" part.
    The next advantage of a column store is deep compression, preserved through
    the executor access path - we can do this with selective vertical
    partitioning using a page-segmented WAH compression similar to what we did
    with bitmap index. Lastly, vectorization of the operators in the executor
    can be implemented with vertical partitioning and an access path that does
    projection before feeding the columns into the executor - this can be done
    in Postgres with a cache-bypass method. Some of this requires working out
    answers to the visibility challenges inherent to MVCC, but it's all possible
    IMO.

    So - under the guise of "TOASTing smaller things", it seems relevant to
    think about vertical partitioning, perhaps making use of what's already in
    Postgres as baby steps toward more advanced features.

    - Luke
  • Hannu Krosing at Mar 22, 2007 at 10:24 pm

    Ühel kenal päeval, N, 2007-03-22 kell 10:19, kirjutas Luke Lonergan:
    Andreas,
    On 3/22/07 9:40 AM, "Andreas Pflug" wrote:

    Wouldn't it be enough to enable having the toast table on a different
    table space?
    Yes, but the ultimate goal would allow the allocation of a storage mechanism
    that is unique to each column. The most frequently used mechanism for our
    customers would likely be to have each column associated with it's own
    internal relation (like a separate TOAST table), which puts each column into
    it's own dense page storage.
    Maybe monetdb ( http://monetdb.cwi.nl/ ) can be of some inspiration ?

    btw, it may be a good idea to have a verion of bizgresMPP which has
    monetdb as partition db, if monetdb is as efficient as they tell it is .
    Beside the advantages of separating out keys columns from data columns,
    compression and encryption approaches that use column packed data are much
    more effective.

    As Tom points out there are complications WRT update, access, etc that need
    to be worked out to support this, but it's an important capability to have
    IMO.

    - Luke



    ---------------------------(end of broadcast)---------------------------
    TIP 7: You can help support the PostgreSQL project by donating at

    http://www.postgresql.org/about/donate
    --
    ----------------
    Hannu Krosing
    Database Architect
    Skype Technologies OÜ
    Akadeemia tee 21 F, Tallinn, 12618, Estonia

    Skype me: callto:hkrosing
    Get Skype for free: http://www.skype.com
  • Luke Lonergan at Mar 22, 2007 at 11:28 pm
    Hi Hannu,
    On 3/22/07 3:21 PM, "Hannu Krosing" wrote:

    Maybe monetdb ( http://monetdb.cwi.nl/ ) can be of some inspiration ?

    btw, it may be a good idea to have a verion of bizgresMPP which has
    monetdb as partition db, if monetdb is as efficient as they tell it is .
    Yep - I've talked this over with the MonetDB folks in the past. The major
    benefits they observe are those same things we talk about here,
    implementation of long loops for operators and de-abstraction of operations
    like compare() when appropriate, say comparing two INT columns in a sort.

    We can get many of those benefits without by "vectorizing" the executor of
    PostgreSQL even without the full column partitioning. We're in the midst of
    working some of those changes as we speak. Early indications are that we
    see large performance gains from this approach. Note that the actual
    instruction counts per row don't change, but the more effective use of L2 I
    and D cache and superscaler instruction units on the CPU create the big
    gains. The MonetDB people present some effective literature on this, but
    the important gains mostly come from the "vectorization", not the operator
    de-abstraction IMO, which is good news for us all.

    - Luke
  • Bruce Momjian at Mar 27, 2007 at 12:40 am

    Luke Lonergan wrote:
    I advocate the following:

    - Enable specification of TOAST policy on a per column basis

    As a first step, then:

    - Enable vertical partitioning of tables using per-column specification of
    storage policy.
    How are these different from ALTER TABLE SET STORAGE? They need to be
    more specific.

    ---------------------------------------------------------------------------

    - Luke

    On 3/21/07 1:12 PM, "Bruce Momjian" wrote:


    Is this a TODO?

    ---------------------------------------------------------------------------

    Jan Wieck wrote:
    On 3/21/2007 2:05 PM, Tom Lane wrote:
    Chris Browne <cbbrowne@acm.org> writes:
    #define TOAST_DENOMINATOR 17
    /* Use this as the divisor; current default behaviour falls from
    TOAST_DENOMINATOR = 4 */
    #define TOAST_TUPLE_THRESHOLD^I\
    ^IMAXALIGN_DOWN((BLCKSZ - \
    ^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
    ^I^I^I^I / TOAST_DENOMINATOR)
    Given that you are quoting code that was demonstrably broken since the
    original coding of TOAST up till a month or two back, "it passes
    regression" is not adequate proof of "it's right". In fact I think
    it's not right; you have not got the roundoff condition straight.
    4. A different mechanism would be to add a fifth storage column
    strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
    say, TOAST.
    FORCE_COMPRESSION, FORCE_EXTERNAL and FORCE_EXTERNAL_UNCOMPRESSED.
    Anything along this line would require invoking the toaster on every
    single tuple, since we'd always have to crawl through all the columns
    to see if toasting was supposed to happen. No thanks.
    Not necessarily. A flag in Relation telling if the table has any column
    marked like that could be set while constructing the relcache entry.
    Which of these sounds preferable?
    It's a bit late in the cycle to be proposing any of these for 8.3.
    Certainly.


    Jan

    --
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me. #
    #================================================== JanWieck@Yahoo.com #

    ---------------------------(end of broadcast)---------------------------
    TIP 2: Don't 'kill -9' the postmaster
    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://www.enterprisedb.com

    + If your life is a hard drive, Christ can be your backup. +
  • Luke Lonergan at Apr 10, 2007 at 2:31 pm
    Hi Bruce,

    How about these:

    - Allow specification of TOAST size threshold in bytes on a per column basis
    - Enable storage of columns in separate TOAST tables
    - Enable use of multi-row compression method(s) for TOAST tables

    - Luke

    On 3/26/07 5:39 PM, "Bruce Momjian" wrote:

    Luke Lonergan wrote:
    I advocate the following:

    - Enable specification of TOAST policy on a per column basis

    As a first step, then:

    - Enable vertical partitioning of tables using per-column specification of
    storage policy.
    How are these different from ALTER TABLE SET STORAGE? They need to be
    more specific.
  • Bruce Momjian at Apr 12, 2007 at 4:24 pm

    Luke Lonergan wrote:
    Hi Bruce,

    How about these:

    - Allow specification of TOAST size threshold in bytes on a per column basis
    - Enable storage of columns in separate TOAST tables
    - Enable use of multi-row compression method(s) for TOAST tables
    At this point I would be happy just to set the TOAST threshold to a
    value defined as optimal, rather than as the most minimal use of TOAST
    possible.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://www.enterprisedb.com

    + If your life is a hard drive, Christ can be your backup. +
  • Luke Lonergan at Apr 12, 2007 at 8:41 pm
    Hi Bruce,
    On 4/12/07 9:24 AM, "Bruce Momjian" wrote:

    Luke Lonergan wrote:
    Hi Bruce,

    How about these:

    - Allow specification of TOAST size threshold in bytes on a per column basis
    - Enable storage of columns in separate TOAST tables
    - Enable use of multi-row compression method(s) for TOAST tables
    At this point I would be happy just to set the TOAST threshold to a
    value defined as optimal, rather than as the most minimal use of TOAST
    possible.
    I agree that's a good starting point, I guess I was thinking that was
    already included in the work that Tom has been doing. If not, we can add a
    TODO like this as a precursor to the ones above:

    - Allow specification of TOAST size threshold (in bytes) on a per table
    basis

    - Luke
  • Tom Lane at Apr 12, 2007 at 8:41 pm

    "Luke Lonergan" <llonergan@greenplum.com> writes:
    On 4/12/07 9:24 AM, "Bruce Momjian" wrote:
    At this point I would be happy just to set the TOAST threshold to a
    value defined as optimal, rather than as the most minimal use of TOAST
    possible.
    I agree that's a good starting point, I guess I was thinking that was
    already included in the work that Tom has been doing.
    No. I put in the code needed to decouple toast tuple size from toasting
    threshold, but I don't have the time or interest to run performance
    tests to see whether there are better default values than the historical
    quarter-page values. Someone should do that before 8.3 beta ...
    If not, we can add a
    TODO like this as a precursor to the ones above:
    - Allow specification of TOAST size threshold (in bytes) on a per table
    basis
    I would suggest that *all* of those TODOs are premature in the absence
    of experimental evidence about the effect of varying the parameters.
    If we end up finding out that the existing settings are about right
    anyway across a range of test cases, who needs more knobs? We've got
    too many mostly-useless knobs already.

    regards, tom lane
  • Luke Lonergan at Apr 12, 2007 at 9:07 pm
    Hi Tom,
    On 4/12/07 1:40 PM, "Tom Lane" wrote:

    I would suggest that *all* of those TODOs are premature in the absence
    of experimental evidence about the effect of varying the parameters.
    If we end up finding out that the existing settings are about right
    anyway across a range of test cases, who needs more knobs? We've got
    too many mostly-useless knobs already.
    This set of TODOs is really about a binary change in behavior that data
    warehouse users will employ to shift given columns into a separate storage
    mechanism while preserving their schema. By contrast, the knob you describe
    is about tuning the existing storage mechanism by offlining values that are
    too large.

    We can talk about these TODOs as a group using the name "vertical
    partitioning" if that suits.

    To demonstrate the effectiveness of vertical partitioning, we would write
    queries that use the partitioned columns independently or as groups that
    correlate with the storage mechanism.

    The other demonstration applies to the use of compression techniques that
    align with the column type(s) and operate across tuple boundaries within
    pages. Examples there include the segmentation of fixed width types and
    variable width types into separate page storage classes, which allows for
    the application of different compression and/or representations on pages.

    - Luke
  • Gregory Stark at Apr 12, 2007 at 9:12 pm

    "Tom Lane" <tgl@sss.pgh.pa.us> writes:

    I would suggest that *all* of those TODOs are premature in the absence
    of experimental evidence about the effect of varying the parameters.
    If we end up finding out that the existing settings are about right
    anyway across a range of test cases, who needs more knobs? We've got
    too many mostly-useless knobs already.
    Isn't it obvious that the "right" value is going to depend extraordinarily
    heavily on the precise usage pattern though?

    A typical table with 100-byte columns which are normally read with the rest of
    the columns, sometimes in sequential scans or updates, will find TOASTing them
    counter-productive as it t urns those all into additional random access i/o
    and extra inserts and deletes.

    Another system with a dozen 100-byte columns that are virtually never accessed
    and a handful of heavily-updated integer columns will benefit heavily from
    TOAST as it changes the table from an awfully performing 5-6 tuple/page table
    into a tremendously performing 100+ tuple/page table.

    We're not ever going to be able to predict when data is being stored what
    future usage pattern to expect, at least not realistically. The best we can
    hope for is to find the tipping point at which the cost if we guess wrong is
    some tolerable level of pain and set that to be the default and giving the
    tools to the user to obtain the benefit in the cases where he knows it'll
    help.

    Or perhaps TOAST is the wrong kind of vertical partitioning for this. Perhaps
    we should be using TOAST to deal with the "large datum" problem and have a
    different looking tool entirely for the "vertical partitioning rarely used
    columns" problem.

    --
    Gregory Stark
    EnterpriseDB http://www.enterprisedb.com
  • Tom Lane at Apr 12, 2007 at 9:19 pm

    Gregory Stark writes:
    "Tom Lane" <tgl@sss.pgh.pa.us> writes:
    I would suggest that *all* of those TODOs are premature in the absence
    of experimental evidence about the effect of varying the parameters.
    Isn't it obvious that the "right" value is going to depend extraordinarily
    heavily on the precise usage pattern though?
    It's not yet "obvious" that there's any win to be had at all. AFAIK no
    one has done any playing around with alternative TOAST settings. It
    could be that the mechanism is simply not very sensitive to those values.
    Or perhaps TOAST is the wrong kind of vertical partitioning for this.
    Exactly my point --- quoting anecdotes about wins from someone else's
    vertical partitioning implementation doesn't really prove a darn thing
    about how TOAST will behave. It's suggestive, but I'd like to see some
    actual experimental evidence before we start constructing a lot of
    infrastructure. "Infrastructure first, learn how to use it later" is
    the philosophy that's given us nigh-useless stuff like commit_delay.

    regards, tom lane
  • Chris Browne at Mar 21, 2007 at 8:29 pm

    Tom Lane writes:
    Chris Browne <cbbrowne@acm.org> writes:
    #define TOAST_DENOMINATOR 17
    /* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */
    #define TOAST_TUPLE_THRESHOLD^I\
    ^IMAXALIGN_DOWN((BLCKSZ - \
    ^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
    ^I^I^I^I / TOAST_DENOMINATOR)
    Given that you are quoting code that was demonstrably broken since
    the original coding of TOAST up till a month or two back, "it passes
    regression" is not adequate proof of "it's right". In fact I think
    it's not right; you have not got the roundoff condition straight.
    OK, then maybe some refinement was needed. That seemed too easy.

    Mind you, the problem seems to me to be that TOAST_TUPLE_THRESHOLD is
    not solely a threshold to compare things to (as done in
    heapam.c/toasting.c), but gets reused to calculate
    TOAST_MAX_CHUNK_SIZE. If the threshold was solely used as that,
    alignment wouldn't matter.

    FYI, I took a sample table and loaded it into the resulting 8.3
    backend based on the us of the totally naive TOAST_DENOMINATOR; there
    may be something off in the sizing of the chunks, but that does not
    appear to have injured fidelity of the tuples I stored.

    Vacuum output:

    --- Production system (no TOASTing)
    INFO: "xml_log_table": found 0 removable, 1731329 nonremovable row versions in 175870 pages
    DETAIL: 0 dead row versions cannot be removed yet.

    --- In the 8.3 instance that did toast things...
    INFO: "xml_log_table": found 0 removable, 1730737 nonremovable row versions in 41120 pages
    DETAIL: 0 dead row versions cannot be removed yet.
    There were 0 unused item pointers.
    610 pages contain useful free space.
    0 pages are entirely empty.
    CPU 1.08s/0.36u sec elapsed 14.94 sec.
    INFO: vacuuming "pg_toast.pg_toast_49194"
    INFO: index "pg_toast_49194_index" now contains 2303864 row versions in 6319 pages
    DETAIL: 0 index row versions were removed.
    0 index pages have been deleted, 0 are currently reusable.
    CPU 0.09s/0.03u sec elapsed 1.71 sec.
    INFO: "pg_toast_49194": found 0 removable, 2303864 nonremovable row versions in 98191 pages
    DETAIL: 0 dead row versions cannot be removed yet.
    There were 0 unused item pointers.
    514 pages contain useful free space.
    0 pages are entirely empty.

    Problem with alignment of TOAST_MAX_CHUNK_SIZE or not, I seem to be
    getting the right results, and this nicely partitions the table into 2
    chunks, one, with the non-XML data, that occupies 41K pages, and the
    TOAST section storing those less-frequently-accessed columns. (There
    is a size difference; the production instance has more empty space
    since it sees active inserts + deletes.)

    In all ways except for "strict hygenic correctness of code," this
    accomplished what I was hoping.

    If someone could make a round-off-safe calculation of
    TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE that exposed the
    denominator so it could be safely modified, that would be one step
    ahead... I generally try not to make changes to the core, so I'll try
    to avoid that...
    4. A different mechanism would be to add a fifth storage column
    strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
    say, TOAST.
    Anything along this line would require invoking the toaster on every
    single tuple, since we'd always have to crawl through all the columns
    to see if toasting was supposed to happen. No thanks.
    Ah, I see. I infer from that that the code starts by checking to see
    if the tuple size is > TOAST_TUPLE_THRESHOLD, and only starts
    rummaging through TOAST infrastructure if the tuple is big enough.

    In that case, "TOAST by default" becomes rather a nonstarter, I agree.
    In the application context I'm thinking of, one table out of ~80 is a
    "TOAST candidate;" making access to the other 79 slower would not be
    of benefit.

    (Aside: I'll link to Simon Rigg's related note, as well as to the item
    on the TODO list...)
    <http://archives.postgresql.org/pgsql-hackers/2007-02/msg00213.php>
    <http://www.postgresql.org/docs/faqs.TODO.html>
    --
    let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
    http://cbbrowne.com/info/finances.html
    Where do you *not* want to go today? "Confutatis maledictis, flammis
    acribus addictis" (<http://www.hex.net/~cbbrowne/msprobs.html>
  • Jim C. Nasby at Mar 21, 2007 at 10:23 pm

    On Wed, Mar 21, 2007 at 12:37:36PM -0400, Chris Browne wrote:
    4. A different mechanism would be to add a fifth storage column
    strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
    say, TOAST.

    At present, the 4 values are essentially advisory; columns get TOASTed
    if the column permits EXTENDED storage, but that only occurs if the
    size is greater than TOAST_TUPLE_THRESHOLD.

    If the new value was chosen, the column would *always* get stored as
    TOAST.
    Rather than a hard and fast limit of 0, why not allow defining a size
    threshold? And while we're at it, how about a size threshold for when to
    try compressing, too?
    Presumably #1 or #2 could readily get into 8.3 as they're pretty easy;
    #3 is a bit trickier, whilst #4 is probably not "8.3-fittable".

    Question:

    Which of these sounds preferable?
    1 and 2 (variations on how to set the denominator) sound completely
    ugly. Trying to minimize wasted space in a toast table is great for a
    default, but exposing something like that to the users via any kind of
    setting seems way to obtuse.

    #3 (GUC for number of bytes) may not make sense for performance reasons,
    as Tom mentioned. I'm hoping that it would be easy to check either
    pg_class or pg_attribute to see if a table/column has non-standard
    toast/compression limits.
    --
    Jim Nasby jim@nasby.net
    EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMar 21, '07 at 4:44p
activeApr 12, '07 at 9:19p
posts21
users9
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase