Hi all,

I wrote a experimental patch for a vertical partitioning
function.

I decided to use the code of TOAST to create the function
easily. In a word, the row that the user specified is forcedly
driven out with TOAST.

The performance gain of 10% was seen by driving out c_data of the
customer table in the DBT-2 benchmark in our environment.

The mechanism of TOAST is an overdesigned system to use it for a
vertical partitioning. Because the overhead of processing is large,
the performance might down according to the environment.

There are seriously a lot of things that should be considered if
a vertical partitioning is mounted.
For instance, TOAST index is omitted, and ctid is used for link.

Your comments are welcome. Thanks.

---
How To Use
---
Use "ALTER TABLE" command.
http://www.postgresql.org/docs/8.1/static/sql-altertable.html

ALTER TABLE name ALTER COLUMN column SET STRAGE FORCEEXTERNAL;

I do not understand whether "FORCEEXTERNAL" is an appropriate
word. Please teach when there is a better word...


--
Junji Teramoto

Search Discussions

  • Jim C. Nasby at Dec 8, 2005 at 5:34 am
    This seems like a useful feature to add, allowing for easy built-in
    verticle partitioning. Are there issues with the patch as-is? (Other
    than it probably should have gone to -patches...)
    On Thu, Dec 01, 2005 at 05:59:08PM +0900, Junji TERAMOTO wrote:
    Hi all,

    I wrote a experimental patch for a vertical partitioning
    function.

    I decided to use the code of TOAST to create the function
    easily. In a word, the row that the user specified is forcedly
    driven out with TOAST.

    The performance gain of 10% was seen by driving out c_data of the
    customer table in the DBT-2 benchmark in our environment.

    The mechanism of TOAST is an overdesigned system to use it for a
    vertical partitioning. Because the overhead of processing is large,
    the performance might down according to the environment.

    There are seriously a lot of things that should be considered if
    a vertical partitioning is mounted.
    For instance, TOAST index is omitted, and ctid is used for link.

    Your comments are welcome. Thanks.

    ---
    How To Use
    ---
    Use "ALTER TABLE" command.
    http://www.postgresql.org/docs/8.1/static/sql-altertable.html

    ALTER TABLE name ALTER COLUMN column SET STRAGE FORCEEXTERNAL;

    I do not understand whether "FORCEEXTERNAL" is an appropriate
    word. Please teach when there is a better word...


    --
    Junji Teramoto
    diff -purN postgresql-8.1.0.org/src/backend/access/heap/heapam.c postgresql-8.1.0/src/backend/access/heap/heapam.c
    --- postgresql-8.1.0.org/src/backend/access/heap/heapam.c 2005-10-15 11:49:08.000000000 +0900
    +++ postgresql-8.1.0/src/backend/access/heap/heapam.c 2005-12-01 15:31:38.307713257 +0900
    @@ -1070,6 +1070,36 @@ heap_get_latest_tid(Relation relation,
    } /* end of loop */
    }

    +// Add by junji from here
    +/*
    + * has_rel_forceexternal - Is there "SET STORAGE FORCEEXTERNAL"ed rows?
    + */
    +bool
    +has_rel_forceexternal(Relation relation)
    +{
    + TupleDesc tupleDesc;
    + Form_pg_attribute *att;
    + int numAttrs;
    + int i;
    +
    + /*
    + * Get the tuple descriptor and break down the tuple(s) into fields.
    + */
    + tupleDesc = relation->rd_att;
    + att = tupleDesc->attrs;
    + numAttrs = tupleDesc->natts;
    +
    + for (i = 0; i < numAttrs; i++)
    + {
    + if (att[i]->attstorage == 'f')
    + return true;
    + }
    +
    + return false;
    +}
    +// Add by junji to here
    +
    +
    /*
    * heap_insert - insert tuple into a heap
    *
    @@ -1130,6 +1160,9 @@ heap_insert(Relation relation, HeapTuple
    * out-of-line attributes from some other relation, invoke the toaster.
    */
    if (HeapTupleHasExternal(tup) ||
    +// Add by junji from here
    + (has_rel_forceexternal(relation)) ||
    +// Add by junji to here
    (MAXALIGN(tup->t_len) > TOAST_TUPLE_THRESHOLD))
    heap_tuple_toast_attrs(relation, tup, NULL);

    @@ -1762,6 +1795,9 @@ l2:
    */
    need_toast = (HeapTupleHasExternal(&oldtup) ||
    HeapTupleHasExternal(newtup) ||
    +// Add by junji from here
    + (has_rel_forceexternal(relation)) ||
    +// Add by junji to here
    (MAXALIGN(newtup->t_len) > TOAST_TUPLE_THRESHOLD));

    newtupsize = MAXALIGN(newtup->t_len);
    diff -purN postgresql-8.1.0.org/src/backend/access/heap/tuptoaster.c postgresql-8.1.0/src/backend/access/heap/tuptoaster.c
    --- postgresql-8.1.0.org/src/backend/access/heap/tuptoaster.c 2005-10-15 11:49:09.000000000 +0900
    +++ postgresql-8.1.0/src/backend/access/heap/tuptoaster.c 2005-12-01 15:29:29.722579466 +0900
    @@ -512,6 +512,46 @@ toast_insert_or_update(Relation rel, Hea
    }
    }

    +// Add by junji from here
    + /*
    + * We look for attributes of attstorage 'f'.
    + */
    + if (rel->rd_rel->reltoastrelid != InvalidOid)
    + {
    + Datum old_value;
    +
    + /*------
    + * Search for the biggest yet inlined attribute with
    + * attstorage equals 'x' or 'e'
    + *------
    + */
    + for (i = 0; i < numAttrs; i++)
    + {
    + if (toast_action[i] == 'p')
    + continue;
    + if (VARATT_IS_EXTERNAL(toast_values[i]))
    + continue;
    + if (att[i]->attstorage != 'f')
    + continue;
    +
    + /*
    + * Store this external
    + */
    + old_value = toast_values[i];
    + toast_action[i] = 'p';
    + toast_values[i] = toast_save_datum(rel, toast_values[i]);
    + if (toast_free[i])
    + pfree(DatumGetPointer(old_value));
    +
    + toast_free[i] = true;
    + toast_sizes[i] = VARATT_SIZE(toast_values[i]);
    +
    + need_change = true;
    + need_free = true;
    + }
    + }
    +// Add by junji to here
    +
    /* ----------
    * Compress and/or save external until data fits into target length
    *
    diff -purN postgresql-8.1.0.org/src/backend/commands/tablecmds.c postgresql-8.1.0/src/backend/commands/tablecmds.c
    --- postgresql-8.1.0.org/src/backend/commands/tablecmds.c 2005-10-15 11:49:15.000000000 +0900
    +++ postgresql-8.1.0/src/backend/commands/tablecmds.c 2005-12-01 15:29:29.726577573 +0900
    @@ -3439,6 +3439,10 @@ ATExecSetStorage(Relation rel, const cha
    newstorage = 'x';
    else if (pg_strcasecmp(storagemode, "main") == 0)
    newstorage = 'm';
    +// Add by junji from here
    + else if (pg_strcasecmp(storagemode, "forceexternal") == 0)
    + newstorage = 'f';
    +// Add by junji to here
    else
    {
    ereport(ERROR,
    @@ -6045,8 +6049,14 @@ needs_toast_table(Relation rel)
    tupdesc = rel->rd_att;
    att = tupdesc->attrs;

    +
    +
    for (i = 0; i < tupdesc->natts; i++)
    {
    +// Add by junji from here
    + if (att[i]->attstorage == 'f')
    + return true;
    +// Add by junji to here
    if (att[i]->attisdropped)
    continue;
    data_length = att_align(data_length, att[i]->attalign);
    diff -purN postgresql-8.1.0.org/src/include/access/heapam.h postgresql-8.1.0/src/include/access/heapam.h
    --- postgresql-8.1.0.org/src/include/access/heapam.h 2005-10-15 11:49:42.000000000 +0900
    +++ postgresql-8.1.0/src/include/access/heapam.h 2005-12-01 15:29:29.726577573 +0900
    @@ -155,6 +155,10 @@ extern void heap_get_latest_tid(Relation
    ItemPointer tid);
    extern void setLastTid(const ItemPointer tid);

    +// Add by junji from here
    +extern bool has_rel_forceexternal(Relation relation);
    +// Add by junji to here
    +
    extern Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid,
    bool use_wal, bool use_fsm);
    extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
    ---------------------------(end of broadcast)---------------------------
    TIP 1: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to majordomo@postgresql.org so that your
    message can get through to the mailing list cleanly

    --
    Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
    Pervasive Software http://pervasive.com work: 512-231-6117
    vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
  • Tom Lane at Dec 8, 2005 at 5:59 am

    "Jim C. Nasby" <jim@nasby.net> writes:
    This seems like a useful feature to add, allowing for easy built-in
    verticle partitioning. Are there issues with the patch as-is?
    Other than the ones mentioned by the poster?

    It seemed to me more like a not-too-successful experiment than something
    ready for application. If you take the viewpoint that this is just
    another TOAST storage strategy, I think it's pretty useless. A large
    field value is going to get toasted anyway with the regular strategy,
    and if your column happens to contain some values that are not large,
    forcing them out-of-line anyway is simply silly. (You could make a case
    for making the threshold size user-controllable, but I don't see the
    case for setting the threshold to zero, which is what this amounts to.)

    The poster was not actually suggesting applying it in the form of a
    force-external TOAST strategy; he was using this as a prototype to try
    to interest people in the idea of out-of-line storage mechanisms with
    lower overhead than TOAST. But that part is all speculation not code.

    Personally, I'd rather look into whether we couldn't speed up TOAST
    without changing any of its basic assumptions. The current
    implementation isn't awful, but it was built to allow the existing table
    and index mechanisms to be re-used for TOAST data. Now that we know for
    certain TOAST is a good idea, it would be reasonable to take a second
    look at whether we could improve the performance with another round of
    implementation effort.

    regards, tom lane
  • Jim C. Nasby at Dec 8, 2005 at 7:34 am

    On Thu, Dec 08, 2005 at 12:59:47AM -0500, Tom Lane wrote:
    "Jim C. Nasby" <jim@nasby.net> writes:
    This seems like a useful feature to add, allowing for easy built-in
    verticle partitioning. Are there issues with the patch as-is?
    Other than the ones mentioned by the poster?

    It seemed to me more like a not-too-successful experiment than something
    ready for application. If you take the viewpoint that this is just
    another TOAST storage strategy, I think it's pretty useless. A large
    field value is going to get toasted anyway with the regular strategy,
    and if your column happens to contain some values that are not large,
    forcing them out-of-line anyway is simply silly. (You could make a case
    for making the threshold size user-controllable, but I don't see the
    case for setting the threshold to zero, which is what this amounts to.)
    Valid point. I do think there's a lot of benefit to being able to set
    the limit much lower than what it currently defaults to today. We have a
    client that has a queue-type table that is updated very frequently. One
    of the fields is text, that is not updated as frequently. Keeping this
    table vacuumed well enough has proven to be problematic, because any
    delay to vacuuming quickly results in a very large amount of bloat.
    Moving that text field into a seperate table would most likely be a win.

    Presumably this would need to be settable on at least a per-table basis.

    Would adding such a variable be a good beginner TODO, or is it too
    invasive?
    Personally, I'd rather look into whether we couldn't speed up TOAST
    without changing any of its basic assumptions. The current
    implementation isn't awful, but it was built to allow the existing table
    and index mechanisms to be re-used for TOAST data. Now that we know for
    certain TOAST is a good idea, it would be reasonable to take a second
    look at whether we could improve the performance with another round of
    implementation effort.
    I've often wondered about all the overhead of storing toast data in what
    amounts to a regular table. Sounds like another TODO...
    --
    Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
    Pervasive Software http://pervasive.com work: 512-231-6117
    vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
  • Junji TERAMOTO at Dec 8, 2005 at 8:51 am
    Hello all,

    Thank you for having the interest.

    Jim C. Nasby wrote:
    Valid point. I do think there's a lot of benefit to being able to set
    the limit much lower than what it currently defaults to today. We have a
    client that has a queue-type table that is updated very frequently. One
    of the fields is text, that is not updated as frequently. Keeping this
    table vacuumed well enough has proven to be problematic, because any
    delay to vacuuming quickly results in a very large amount of bloat.
    Moving that text field into a seperate table would most likely be a win.
    Yes, our team think that this patch is effective that the tuple can be
    partially updated.
    For instance, DBT-2 updates frequently contents excluding c_data in the
    customer table. Because c_data(about 400bytes: The size of the entire
    tuple is 500bytes.) is copied together in every case, it is thought that
    it has decreased the performance.
    That is more important than the vertical partitioning function.

    Of course, it is important to change DDL of the table. However, I think
    it might be useful when it is not possible to change.

    As pointed out by Tom, this is a patch to verify the idea.
    I want to know that community is how much interested in a partial update.
    Of course, it is interested whether to want the vertical partitioning
    function in PostgreSQL, too. :-)


    By the way, should I send the patch to -patches again?

    --
    Junji Teramoto / teramoto.junji (a) lab.ntt.co.jp
  • Bruce Momjian at Dec 8, 2005 at 3:03 pm

    Jim C. Nasby wrote:
    On Thu, Dec 08, 2005 at 12:59:47AM -0500, Tom Lane wrote:
    "Jim C. Nasby" <jim@nasby.net> writes:
    This seems like a useful feature to add, allowing for easy built-in
    verticle partitioning. Are there issues with the patch as-is?
    Other than the ones mentioned by the poster?

    It seemed to me more like a not-too-successful experiment than something
    ready for application. If you take the viewpoint that this is just
    another TOAST storage strategy, I think it's pretty useless. A large
    field value is going to get toasted anyway with the regular strategy,
    and if your column happens to contain some values that are not large,
    forcing them out-of-line anyway is simply silly. (You could make a case
    for making the threshold size user-controllable, but I don't see the
    case for setting the threshold to zero, which is what this amounts to.)
    Valid point. I do think there's a lot of benefit to being able to set
    the limit much lower than what it currently defaults to today. We have a
    client that has a queue-type table that is updated very frequently. One
    of the fields is text, that is not updated as frequently. Keeping this
    table vacuumed well enough has proven to be problematic, because any
    delay to vacuuming quickly results in a very large amount of bloat.
    Moving that text field into a seperate table would most likely be a win.

    Presumably this would need to be settable on at least a per-table basis.

    Would adding such a variable be a good beginner TODO, or is it too
    invasive?
    Well, we have now:

    ALTER TABLE ALTER [ COLUMN ] column
    SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

    What else is needed?

    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.pha.pa.us | (610) 359-1001
    + If your life is a hard drive, | 13 Roberts Road
    + Christ can be your backup. | Newtown Square, Pennsylvania 19073
  • Jim C. Nasby at Dec 8, 2005 at 6:43 pm

    On Thu, Dec 08, 2005 at 10:03:43AM -0500, Bruce Momjian wrote:
    Jim C. Nasby wrote:
    On Thu, Dec 08, 2005 at 12:59:47AM -0500, Tom Lane wrote:
    "Jim C. Nasby" <jim@nasby.net> writes:
    This seems like a useful feature to add, allowing for easy built-in
    verticle partitioning. Are there issues with the patch as-is?
    Other than the ones mentioned by the poster?

    It seemed to me more like a not-too-successful experiment than something
    ready for application. If you take the viewpoint that this is just
    another TOAST storage strategy, I think it's pretty useless. A large
    field value is going to get toasted anyway with the regular strategy,
    and if your column happens to contain some values that are not large,
    forcing them out-of-line anyway is simply silly. (You could make a case
    for making the threshold size user-controllable, but I don't see the
    case for setting the threshold to zero, which is what this amounts to.)
    Valid point. I do think there's a lot of benefit to being able to set
    the limit much lower than what it currently defaults to today. We have a
    client that has a queue-type table that is updated very frequently. One
    of the fields is text, that is not updated as frequently. Keeping this
    table vacuumed well enough has proven to be problematic, because any
    delay to vacuuming quickly results in a very large amount of bloat.
    Moving that text field into a seperate table would most likely be a win.

    Presumably this would need to be settable on at least a per-table basis.

    Would adding such a variable be a good beginner TODO, or is it too
    invasive?
    Well, we have now:

    ALTER TABLE ALTER [ COLUMN ] column
    SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

    What else is needed?
    As Tom suggested, I think it would be best to be able to change the size
    at which a field gets stored externally. I think it also makes sense to
    have this reverse the normal order of compress first, then if it still
    doesn't fit store it externally. I forsee this typically being useful
    when you have fields that are between ~100 and 1000 bytes in size, and
    I'm doubtful that compression would do much good there. But I wouldn't
    rule out this being useful on fields that can also sometimes contain
    much larger amounts of data, so I don't think it makes sense to disable
    compression completely. So, I think this leaves two new options:

    SET STORAGE EXTERNAL [THRESHOLD x]
    If a field is over x in size, it's stored externally.

    SET STORAGE EXTENDED [THRESHOLD x]
    If a field is over x in size, it's stored externally. If it's over
    BLCKSZ/4 it will also be compressed (I think that's how things work
    now).

    Actually, that's rather ugly. I think it would be better to just break
    external storage and compression out into their own attributes:

    SET STORAGE [ ALLOW EXTERNAL [THRESHOLD x] ] [ ALLOW COMPRESSION [THRESHOLD x] ]

    ALLOW EXTERNAL: if a field is larger than x bytes (BLCKSZ/4 by default)
    then it will be stored externally. May be specified along with ALLOW
    COMPRESSION.

    ALLOW COMPRESSION: if a field is larger than x bytes (BLCKSZ/4 by default)
    then it will be COMPRESSED. May be specified along with ALLOW EXTERNAL.
    --
    Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
    Pervasive Software http://pervasive.com work: 512-231-6117
    vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
  • Jan Wieck at Dec 8, 2005 at 7:04 pm

    On 12/8/2005 1:42 PM, Jim C. Nasby wrote:
    On Thu, Dec 08, 2005 at 10:03:43AM -0500, Bruce Momjian wrote:
    Jim C. Nasby wrote:
    On Thu, Dec 08, 2005 at 12:59:47AM -0500, Tom Lane wrote:
    "Jim C. Nasby" <jim@nasby.net> writes:
    This seems like a useful feature to add, allowing for easy built-in
    verticle partitioning. Are there issues with the patch as-is?
    Other than the ones mentioned by the poster?

    It seemed to me more like a not-too-successful experiment than something
    ready for application. If you take the viewpoint that this is just
    another TOAST storage strategy, I think it's pretty useless. A large
    field value is going to get toasted anyway with the regular strategy,
    and if your column happens to contain some values that are not large,
    forcing them out-of-line anyway is simply silly. (You could make a case
    for making the threshold size user-controllable, but I don't see the
    case for setting the threshold to zero, which is what this amounts to.)
    Valid point. I do think there's a lot of benefit to being able to set
    the limit much lower than what it currently defaults to today. We have a
    client that has a queue-type table that is updated very frequently. One
    of the fields is text, that is not updated as frequently. Keeping this
    table vacuumed well enough has proven to be problematic, because any
    delay to vacuuming quickly results in a very large amount of bloat.
    Moving that text field into a seperate table would most likely be a win.

    Presumably this would need to be settable on at least a per-table basis.

    Would adding such a variable be a good beginner TODO, or is it too
    invasive?
    Well, we have now:

    ALTER TABLE ALTER [ COLUMN ] column
    SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

    What else is needed?
    As Tom suggested, I think it would be best to be able to change the size
    at which a field gets stored externally. I think it also makes sense to
    have this reverse the normal order of compress first, then if it still
    doesn't fit store it externally. I forsee this typically being useful
    when you have fields that are between ~100 and 1000 bytes in size, and
    I'm doubtful that compression would do much good there. But I wouldn't
    rule out this being useful on fields that can also sometimes contain
    much larger amounts of data, so I don't think it makes sense to disable
    compression completely. So, I think this leaves two new options:
    It's not the size of a field that triggers toasting. It is the size of
    the entire tuple. As long as that is > BLKSIZE/4, the toaster will pick
    the currently largest inline value and do "something" with it.
    "something" is either compressing or (if not allowed or already done)
    moving external.


    Jan
    SET STORAGE EXTERNAL [THRESHOLD x]
    If a field is over x in size, it's stored externally.

    SET STORAGE EXTENDED [THRESHOLD x]
    If a field is over x in size, it's stored externally. If it's over
    BLCKSZ/4 it will also be compressed (I think that's how things work
    now).

    Actually, that's rather ugly. I think it would be better to just break
    external storage and compression out into their own attributes:

    SET STORAGE [ ALLOW EXTERNAL [THRESHOLD x] ] [ ALLOW COMPRESSION [THRESHOLD x] ]

    ALLOW EXTERNAL: if a field is larger than x bytes (BLCKSZ/4 by default)
    then it will be stored externally. May be specified along with ALLOW
    COMPRESSION.

    ALLOW COMPRESSION: if a field is larger than x bytes (BLCKSZ/4 by default)
    then it will be COMPRESSED. May be specified along with ALLOW EXTERNAL.
    --
    Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
    Pervasive Software http://pervasive.com work: 512-231-6117
    vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

    --
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me. #
    #================================================== JanWieck@Yahoo.com #
  • Bruce Momjian at Mar 3, 2006 at 3:17 am
    Is there still interst in this idea for TODO?

    ---------------------------------------------------------------------------

    Jim C. Nasby wrote:
    On Thu, Dec 08, 2005 at 10:03:43AM -0500, Bruce Momjian wrote:
    Jim C. Nasby wrote:
    On Thu, Dec 08, 2005 at 12:59:47AM -0500, Tom Lane wrote:
    "Jim C. Nasby" <jim@nasby.net> writes:
    This seems like a useful feature to add, allowing for easy built-in
    verticle partitioning. Are there issues with the patch as-is?
    Other than the ones mentioned by the poster?

    It seemed to me more like a not-too-successful experiment than something
    ready for application. If you take the viewpoint that this is just
    another TOAST storage strategy, I think it's pretty useless. A large
    field value is going to get toasted anyway with the regular strategy,
    and if your column happens to contain some values that are not large,
    forcing them out-of-line anyway is simply silly. (You could make a case
    for making the threshold size user-controllable, but I don't see the
    case for setting the threshold to zero, which is what this amounts to.)
    Valid point. I do think there's a lot of benefit to being able to set
    the limit much lower than what it currently defaults to today. We have a
    client that has a queue-type table that is updated very frequently. One
    of the fields is text, that is not updated as frequently. Keeping this
    table vacuumed well enough has proven to be problematic, because any
    delay to vacuuming quickly results in a very large amount of bloat.
    Moving that text field into a seperate table would most likely be a win.

    Presumably this would need to be settable on at least a per-table basis.

    Would adding such a variable be a good beginner TODO, or is it too
    invasive?
    Well, we have now:

    ALTER TABLE ALTER [ COLUMN ] column
    SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

    What else is needed?
    As Tom suggested, I think it would be best to be able to change the size
    at which a field gets stored externally. I think it also makes sense to
    have this reverse the normal order of compress first, then if it still
    doesn't fit store it externally. I forsee this typically being useful
    when you have fields that are between ~100 and 1000 bytes in size, and
    I'm doubtful that compression would do much good there. But I wouldn't
    rule out this being useful on fields that can also sometimes contain
    much larger amounts of data, so I don't think it makes sense to disable
    compression completely. So, I think this leaves two new options:

    SET STORAGE EXTERNAL [THRESHOLD x]
    If a field is over x in size, it's stored externally.

    SET STORAGE EXTENDED [THRESHOLD x]
    If a field is over x in size, it's stored externally. If it's over
    BLCKSZ/4 it will also be compressed (I think that's how things work
    now).

    Actually, that's rather ugly. I think it would be better to just break
    external storage and compression out into their own attributes:

    SET STORAGE [ ALLOW EXTERNAL [THRESHOLD x] ] [ ALLOW COMPRESSION [THRESHOLD x] ]

    ALLOW EXTERNAL: if a field is larger than x bytes (BLCKSZ/4 by default)
    then it will be stored externally. May be specified along with ALLOW
    COMPRESSION.

    ALLOW COMPRESSION: if a field is larger than x bytes (BLCKSZ/4 by default)
    then it will be COMPRESSED. May be specified along with ALLOW EXTERNAL.
    --
    Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
    Pervasive Software http://pervasive.com work: 512-231-6117
    vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

    ---------------------------(end of broadcast)---------------------------
    TIP 3: Have you checked our extensive FAQ?

    http://www.postgresql.org/docs/faq
    --
    Bruce Momjian http://candle.pha.pa.us
    SRA OSS, Inc. http://www.sraoss.com

    + If your life is a hard drive, Christ can be your backup. +
  • Jim C. Nasby at Mar 3, 2006 at 10:16 pm
    If this would be accepted I might actually be able to accomplish this.
    Maybe. :) But having a TODO wouldn't be a bad idea as well...

    Would this require 2 new fields in pg_attribute, or is there a better
    way to store the thresholds? I'm thinking that each field would need two
    special values; 0 for 'no external/compression' and -1 for 'use default'
    (hrm, I guess this means we should add at least one GUC to control that
    default...)

    I suspect there's folks on -general who would express interest if you
    want me to ask there...
    On Thu, Mar 02, 2006 at 10:15:19PM -0500, Bruce Momjian wrote:

    Is there still interst in this idea for TODO?

    ---------------------------------------------------------------------------
    As Tom suggested, I think it would be best to be able to change the size
    at which a field gets stored externally. I think it also makes sense to
    have this reverse the normal order of compress first, then if it still
    doesn't fit store it externally. I forsee this typically being useful
    when you have fields that are between ~100 and 1000 bytes in size, and
    I'm doubtful that compression would do much good there. But I wouldn't
    rule out this being useful on fields that can also sometimes contain
    much larger amounts of data, so I don't think it makes sense to disable
    compression completely. So, I think this leaves two new options:

    SET STORAGE EXTERNAL [THRESHOLD x]
    If a field is over x in size, it's stored externally.

    SET STORAGE EXTENDED [THRESHOLD x]
    If a field is over x in size, it's stored externally. If it's over
    BLCKSZ/4 it will also be compressed (I think that's how things work
    now).

    Actually, that's rather ugly. I think it would be better to just break
    external storage and compression out into their own attributes:

    SET STORAGE [ ALLOW EXTERNAL [THRESHOLD x] ] [ ALLOW COMPRESSION [THRESHOLD x] ]

    ALLOW EXTERNAL: if a field is larger than x bytes (BLCKSZ/4 by default)
    then it will be stored externally. May be specified along with ALLOW
    COMPRESSION.

    ALLOW COMPRESSION: if a field is larger than x bytes (BLCKSZ/4 by default)
    then it will be COMPRESSED. May be specified along with ALLOW EXTERNAL.
    --
    Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
    Pervasive Software http://pervasive.com work: 512-231-6117
    vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
  • Hannu Krosing at Mar 4, 2006 at 10:15 am

    Ühel kenal päeval, N, 2006-03-02 kell 22:15, kirjutas Bruce Momjian:
    Is there still interst in this idea for TODO?
    Just to voice my support - Yes, I think that being able to set lower
    thresolds for TOAST is very useful in several cases.

    Also getting rid of toast index and start using ctids directly would be
    a big bonus.

    When using direct ctids we could use either ctid chains or some sort of
    skiplist for access to N-th TOAST chunk.

    ------------
    Hannu
  • Martijn van Oosterhout at Mar 4, 2006 at 10:22 am

    On Sat, Mar 04, 2006 at 12:15:46PM +0200, Hannu Krosing wrote:
    Also getting rid of toast index and start using ctids directly would be
    a big bonus.

    When using direct ctids we could use either ctid chains or some sort of
    skiplist for access to N-th TOAST chunk.
    I suppose this would mean that you couldn't use vacuum on the toast
    table anymore. Or teach vacuum that everytime it moves a tuple it needs
    to update the original table (sequential scan). What exactly are you
    trying to save here?

    Have a nice day,
    --
    Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
    Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
    tool for doing 5% of the work and then sitting around waiting for someone
    else to do the other 95% so you can sue them.
  • Tom Lane at Mar 4, 2006 at 3:31 pm

    Martijn van Oosterhout writes:
    On Sat, Mar 04, 2006 at 12:15:46PM +0200, Hannu Krosing wrote:
    Also getting rid of toast index and start using ctids directly would be
    a big bonus.
    When using direct ctids we could use either ctid chains or some sort of
    skiplist for access to N-th TOAST chunk.
    I suppose this would mean that you couldn't use vacuum on the toast
    table anymore.
    Another problem with it is that it'd destroy the current optimizations
    that allow partial fetches of uncompressed TOASTed fields to be fast.
    You couldn't fetch page N of a TOAST datum without reading all the pages
    before it.

    I suppose the objection that toast tables wouldn't be regular tables
    anymore might not be fatal, but you'll certainly get some pushback if
    you try to take away the direct-access optimizations.

    regards, tom lane
  • Hannu Krosing at Mar 4, 2006 at 7:15 pm

    Ühel kenal päeval, L, 2006-03-04 kell 10:31, kirjutas Tom Lane:
    Martijn van Oosterhout <kleptog@svana.org> writes:
    On Sat, Mar 04, 2006 at 12:15:46PM +0200, Hannu Krosing wrote:
    Also getting rid of toast index and start using ctids directly would be
    a big bonus.
    When using direct ctids we could use either ctid chains or some sort of
    skiplist for access to N-th TOAST chunk.
    I suppose this would mean that you couldn't use vacuum on the toast
    table anymore.
    Another problem with it is that it'd destroy the current optimizations
    that allow partial fetches of uncompressed TOASTed fields to be fast.
    You couldn't fetch page N of a TOAST datum without reading all the pages
    before it.

    I suppose the objection that toast tables wouldn't be regular tables
    anymore might not be fatal, but you'll certainly get some pushback if
    you try to take away the direct-access optimizations.
    That's why I was suggesting skiplist instead on simple linked lists.

    Another way would be to put a list of all toast ctids for your whole
    toasted field in the first page(s) of the toast.

    That way you will still have option of fast access to any partial of the
    field, most likely even faster than with current implementation, as you
    have to touch less pages. And you can have it also for compressed fields
    if you store uncompressed offsets.

    -----------------
    Hannu

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedDec 1, '05 at 12:07p
activeMar 4, '06 at 7:15p
posts14
users8
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase