FAQ
While looking at the recently-noticed problem that HashAggregate nodes
store more columns of the input than they need to, I couldn't help
noticing how much of the hashtable space goes into HeapTuple header
overhead. A couple months ago we were able to get a useful improvement
in sorting by not storing unnecessary header fields in sort files, and
I'm strongly tempted to do the same in tuple hash tables.

Unlike the case with sort temp files, it's important to be able to
access the stored data without moving/copying it. So, not wishing to
duplicate all the tuple access machinery we have already, I'm
envisioning a compromise design that leaves a couple bytes on the table
but looks enough like a standard tuple to be directly usable.
Specifically, something like this:

typedef struct TruncatedTupleData
{
uint32 t_len; /* length of tuple */

char pad[...]; /* see below */

int16 t_natts; /* number of attributes */
... the rest matching HeapTupleHeaderData ...
}

The padding would be chosen such that the offset of t_natts would have
the same value modulo MAXIMUM_ALIGNOF as it has in HeapTupleHeaderData.
This ensures that if a TruncatedTuple is stored starting on a MAXALIGN
boundary, data within it is correctly aligned the same as it would be
in a normal tuple. With the current struct definitions, 2 bytes of
padding would be needed on all supported platforms, and a
TruncatedTuple would be 16 bytes shorter than a regular tuple.
However, because we are also eliminating a HeapTupleData struct, the
total savings in tuple hash tables would be 36 to 40 bytes per tuple.

To make use of a TruncatedTuple, we'd set up a temporary HeapTupleData
struct with its t_data field pointing 16 bytes before the start of the
TruncatedTuple. As long as the code using it never tries to access any
of the missing fields (t_xmin through t_ctid), this would work exactly
like a normal HeapTuple.

Going forward, we'd have to be careful to preserve the existing
field-ordering separation between transaction status fields and data
content fields in tuple headers, but that's just a matter of adding some
comments to htup.h.

It's tempting to think about using this same representation for tuples
stored in memory by tuplesort.c and tuplestore.c. That'd probably
require some changes in their APIs, but I think it's doable.

Comments? Anyone think this is too ugly for words?

regards, tom lane

Search Discussions

  • Martijn van Oosterhout at Jun 26, 2006 at 2:49 pm

    On Mon, Jun 26, 2006 at 10:36:00AM -0400, Tom Lane wrote:
    While looking at the recently-noticed problem that HashAggregate nodes
    store more columns of the input than they need to, I couldn't help
    noticing how much of the hashtable space goes into HeapTuple header
    overhead. A couple months ago we were able to get a useful improvement
    in sorting by not storing unnecessary header fields in sort files, and
    I'm strongly tempted to do the same in tuple hash tables.

    Unlike the case with sort temp files, it's important to be able to
    access the stored data without moving/copying it. So, not wishing to
    duplicate all the tuple access machinery we have already, I'm
    envisioning a compromise design that leaves a couple bytes on the table
    but looks enough like a standard tuple to be directly usable.
    I considered this, but ran into the problem that heap_getattr fell back
    to fastgetattr, which wouldn't know what kind of tuple it was given.
    Now, if you're going to add a special heap_getattr for these tuples,
    then ofcourse there's no problem.

    Maybe create a version of heap_getattr that takes the fallback function
    as a parameter?

    Anyway, I think it's a good idea. Most places in the backend after the
    SeqScan/IndexScan node really don't care about most of the header
    fields and being able to drop them would be nice.

    Have a nice day,
    --
    Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
    From each according to his ability. To each according to his ability to litigate.
  • Tom Lane at Jun 26, 2006 at 2:57 pm

    Martijn van Oosterhout writes:
    On Mon, Jun 26, 2006 at 10:36:00AM -0400, Tom Lane wrote:
    Unlike the case with sort temp files, it's important to be able to
    access the stored data without moving/copying it. So, not wishing to
    duplicate all the tuple access machinery we have already, I'm
    envisioning a compromise design that leaves a couple bytes on the table
    but looks enough like a standard tuple to be directly usable.
    I considered this, but ran into the problem that heap_getattr fell back
    to fastgetattr, which wouldn't know what kind of tuple it was given.
    Now, if you're going to add a special heap_getattr for these tuples,
    then ofcourse there's no problem.
    No, I'm specifically *not* going to do that. AFAICS the proposed
    representation requires no changes in heap_getattr; if it did, it'd
    be too invasive to contemplate :-(. It should look exactly like any
    other HeapTuple structure, except that the "transaction status" fields
    will contain garbage. Do you see something I missed?

    regards, tom lane
  • Simon Riggs at Jun 26, 2006 at 6:40 pm

    On Mon, 2006-06-26 at 16:48 +0200, Martijn van Oosterhout wrote:
    On Mon, Jun 26, 2006 at 10:36:00AM -0400, Tom Lane wrote:
    While looking at the recently-noticed problem that HashAggregate nodes
    store more columns of the input than they need to, I couldn't help
    noticing how much of the hashtable space goes into HeapTuple header
    overhead. A couple months ago we were able to get a useful improvement
    in sorting by not storing unnecessary header fields in sort files, and
    I'm strongly tempted to do the same in tuple hash tables.
    Anyway, I think it's a good idea. Most places in the backend after the
    SeqScan/IndexScan node really don't care about most of the header
    fields and being able to drop them would be nice.
    I understood Tom meant to do this only for HashAgg and Tuplestore. Tom,
    is it possible to extend this further across the executor as Martijn
    suggests? That would be useful, even if it is slightly harder to measure
    the benefit than it is with the can-spill-to-disk cases.

    IMHO it would be better to call the format TupleWithoutVisibilityData so
    its very clear that accessing the visibility fields aren't present and
    any attempt to access them would be a mistake. TruncatedTuple isn't
    clear about what's missing or its consequences.

    --
    Simon Riggs
    EnterpriseDB http://www.enterprisedb.com
  • Tom Lane at Jun 26, 2006 at 6:36 pm

    Simon Riggs writes:
    On Mon, 2006-06-26 at 16:48 +0200, Martijn van Oosterhout wrote:
    Anyway, I think it's a good idea. Most places in the backend after the
    SeqScan/IndexScan node really don't care about most of the header
    fields and being able to drop them would be nice.
    I understood Tom meant to do this only for HashAgg and Tuplestore. Tom,
    is it possible to extend this further across the executor as Martijn
    suggests? That would be useful, even if it is slightly harder to measure
    the benefit than it is with the can-spill-to-disk cases.
    There isn't any benefit except where we collect lots of tuples, which is
    to say tuplesort/tuplestore/tuplehashtable. In other places in the
    executor, there's basically only one transient tuple in existence per
    plan node; jumping through hoops to save 16 bytes per plan node is just
    silly. (What's more, as of 8.1 most of those tuples will be in "virtual
    tuple" format anyway, and so the optimization wouldn't make any
    difference at all...)
    IMHO it would be better to call the format TupleWithoutVisibilityData so
    its very clear that accessing the visibility fields aren't present and
    any attempt to access them would be a mistake. TruncatedTuple isn't
    clear about what's missing or its consequences.
    I'm not wedded to "TruncatedTuple", but I don't like your suggestion
    better; it presumes too much about what the difference might be between
    truncated and full tuples. Even today, I don't think
    "TupleWithoutVisibilityData" makes it clear that t_ctid is missing;
    and down the road there might be other header fields that we don't need
    to have in in-memory tuples. Another small problem is that given our
    naming conventions for structs vs pointers to structs, using "Data" as
    the last word of a struct name is a bad idea --- for consistency,
    pointers to it would be typedef TupleWithoutVisibility, which seems a
    bit weird. For consistency with the existing struct names, I think we
    need to choose a name of the form "<adjective>Tuple".

    I thought for awhile about MemoryTuple (as contrasted to HeapTuple) but
    that seems too generic. Any other thoughts?

    regards, tom lane
  • Simon Riggs at Jun 26, 2006 at 7:46 pm

    On Mon, 2006-06-26 at 14:36 -0400, Tom Lane wrote:
    Simon Riggs <simon@2ndquadrant.com> writes:
    On Mon, 2006-06-26 at 16:48 +0200, Martijn van Oosterhout wrote:
    Anyway, I think it's a good idea. Most places in the backend after the
    SeqScan/IndexScan node really don't care about most of the header
    fields and being able to drop them would be nice.
    I understood Tom meant to do this only for HashAgg and Tuplestore. Tom,
    is it possible to extend this further across the executor as Martijn
    suggests? That would be useful, even if it is slightly harder to measure
    the benefit than it is with the can-spill-to-disk cases.
    There isn't any benefit
    OK, see that...
    I thought for awhile about MemoryTuple (as contrasted to HeapTuple) but
    that seems too generic. Any other thoughts?
    I like MemoryTuple but since we only use it when we go to disk...

    ExecutorTuple, MinimalTuple, DataOnlyTuple, MultTuple, TempFileTuple

    Pick one: I'm sorry I opined.

    --
    Simon Riggs
    EnterpriseDB http://www.enterprisedb.com
  • Tom Lane at Jun 26, 2006 at 8:00 pm

    Simon Riggs writes:
    On Mon, 2006-06-26 at 14:36 -0400, Tom Lane wrote:
    I thought for awhile about MemoryTuple (as contrasted to HeapTuple) but
    that seems too generic. Any other thoughts?
    I like MemoryTuple but since we only use it when we go to disk...
    ExecutorTuple, MinimalTuple, DataOnlyTuple, MultTuple, TempFileTuple
    MinimalTuple seems good to me ...

    regards, tom lane
  • Tom Lane at Jun 26, 2006 at 8:43 pm

    I wrote:
    There isn't any benefit except where we collect lots of tuples, which is
    to say tuplesort/tuplestore/tuplehashtable. In other places in the
    executor, there's basically only one transient tuple in existence per
    plan node; jumping through hoops to save 16 bytes per plan node is just
    silly. (What's more, as of 8.1 most of those tuples will be in "virtual
    tuple" format anyway, and so the optimization wouldn't make any
    difference at all...)
    After further study of the code, here's my hit-list of places that could
    make worthwhile use of MinimalTuples:

    tuplesort.c (in-memory, on-disk case done already)
    tuplestore.c (in-memory and on-disk)
    TupleHashTable (execGrouping.c --- used by nodeAgg and nodeSubplan)
    hash joins (in-memory hash table and tuple "batch" files)
    analyze.c (tuples collected in-memory for stats analysis)

    It looks like there is actually not anyplace else in the executor where
    we "materialize" tuples anymore, except for execMain.c's INSERT/UPDATE
    code, which of course is going to want full tuples it can stash on disk.
    Everything else is dealing in TupleTableSlots that probably contain
    virtual tuples.

    So in one sense this *is* "all across the executor". But the amount of
    code to touch seems pretty limited.

    regards, tom lane
  • Simon Riggs at Jun 26, 2006 at 8:59 pm

    On Mon, 2006-06-26 at 16:43 -0400, Tom Lane wrote:
    I wrote:
    There isn't any benefit except where we collect lots of tuples, which is
    to say tuplesort/tuplestore/tuplehashtable. In other places in the
    executor, there's basically only one transient tuple in existence per
    plan node; jumping through hoops to save 16 bytes per plan node is just
    silly. (What's more, as of 8.1 most of those tuples will be in "virtual
    tuple" format anyway, and so the optimization wouldn't make any
    difference at all...)
    After further study of the code, here's my hit-list of places that could
    make worthwhile use of MinimalTuples:

    tuplesort.c (in-memory, on-disk case done already)
    tuplestore.c (in-memory and on-disk)
    TupleHashTable (execGrouping.c --- used by nodeAgg and nodeSubplan)
    hash joins (in-memory hash table and tuple "batch" files)
    Thats the list I thought you meant.
    analyze.c (tuples collected in-memory for stats analysis)
    Do we save enough there for us to care?

    Will that allow us to increase the sample size for larger tables?

    --
    Simon Riggs
    EnterpriseDB http://www.enterprisedb.com
  • Tom Lane at Jun 26, 2006 at 9:08 pm

    Simon Riggs writes:
    On Mon, 2006-06-26 at 16:43 -0400, Tom Lane wrote:
    analyze.c (tuples collected in-memory for stats analysis)
    Do we save enough there for us to care?
    Possibly not --- it's certainly low-priority, but I listed it for
    completeness.

    regards, tom lane
  • Bort, Paul at Jun 26, 2006 at 3:29 pm

    Tom Lane said:

    To make use of a TruncatedTuple, we'd set up a temporary HeapTupleData
    struct with its t_data field pointing 16 bytes before the start of the
    TruncatedTuple. As long as the code using it never tries to
    access any
    of the missing fields (t_xmin through t_ctid), this would work exactly
    like a normal HeapTuple.
    This sounds like a security risk. What's the worst thing that could be
    in
    those 16 bytes, and could that be used to bite (sorry) us?

    If those 16 bytes could be user data in another tuple, there might be an
    attack there.
  • Tom Lane at Jun 26, 2006 at 3:45 pm

    "Bort, Paul" <pbort@tmwsystems.com> writes:
    Tom Lane said:
    As long as the code using it never tries to access any
    of the missing fields (t_xmin through t_ctid), this would work exactly
    like a normal HeapTuple.
    This sounds like a security risk.
    No more than any other wild-pointer problem. At the level of C code
    it's always trivial to break anything ;-). The reason we don't need
    to worry is that in the upper levels of the executor there is no such
    thing as access to system columns --- any attempt to fetch a system
    column is converted to a Var reference that appears in the initial
    table-scan node, and thereafter it's an ordinary user column. This
    must be so because trying to keep the system columns in their normal
    privileged positions breaks down as soon as you consider a join; there
    would only be room for one, and the query might well be trying to fetch
    (say) ctid from more than one table. So any code that was trying to
    fetch these fields would be buggy anyway.

    In the case of the TupleHashTable code, the only access that need be
    provided is through a TupleTableSlot. We could get a little bit of
    protection against programming mistakes by using the "virtual tuple"
    feature of slots to disallow attempts to fetch any system columns from
    a truncated tuple. I'm not sure if this would be feasible for tuplesort
    though; haven't looked at how it's used yet.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJun 26, '06 at 2:36p
activeJun 26, '06 at 9:08p
posts12
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase