The following patch truncates trailing null attributes from heap rows to reduce the size of the row bitmap.


Applications often have wide rows in which many of the trailing column values are null. On an insert/update, all of the trailing null columns are tracked in the row bitmap. This can add a substantial overhead for very wide rows. This change truncates heap rows such that the trailing nulls are elided.

The intuition for this change is that ALTER TABLE t ADD COLUMN c type NULL is a metadata only change. Postgres works fine when a row's metadata (tuple descriptor) is inconsistent with the actual row data: extra columns are assumed to be null. This change just adjusts the number of attributes for a row and the row bitmap to only track up to the last non-null attribute.

Thanks.

-Jamie Martin

Search Discussions

  • Tom Lane at Apr 17, 2012 at 4:39 pm

    Jameison Martin writes:
    The following patch truncates trailing null attributes from heap rows to reduce the size of the row bitmap.
    This has been discussed before, but it always seemed that the
    cost-benefit ratio was exceedingly questionable. You don't get any
    savings whatsoever unless you reduce the size of the null bitmap across
    a MAXALIGN boundary, which more and more often is 64 bits, so that the
    frequency with which the optimization wins anything doesn't look likely
    to be that high. And on the other side of the coin, you're adding
    cycles to every single tuple-construction operation to make this work.
    The introduction of bugs doesn't seem improbable either. (Just because
    tuples in user tables might have unexpected natts values doesn't mean
    that the code is, or should be, prepared to tolerate that in system
    tables or plan-constructed tuples.)

    So what I'd like to see is some concrete test results proving that this
    is a net win, or at least not a net loss, for typical cases. Just
    asserting that it might be a win for certain usage patterns doesn't do
    it for me.

    regards, tom lane
  • Greg Stark at Apr 17, 2012 at 8:28 pm

    On Tue, Apr 17, 2012 at 5:38 PM, Tom Lane wrote:
    This has been discussed before, but it always seemed that the
    cost-benefit ratio was exceedingly questionable.  You don't get any
    savings whatsoever unless you reduce the size of the null bitmap across
    a MAXALIGN boundary, which more and more often is 64 bits, so that the
    frequency with which the optimization wins anything doesn't look likely
    to be that high.
    There is the usage pattern where (brace yourself) people have
    thousands of columns in which they have all but a handful be null.
    They might be pretty happy about this. I'm not sure if that's a use
    case that makes sense to optimize for though -- even for them the
    space overhead would be noticeable but not a showstopper.

    --
    greg
  • Tom Lane at Apr 17, 2012 at 8:43 pm

    Greg Stark writes:
    On Tue, Apr 17, 2012 at 5:38 PM, Tom Lane wrote:
    This has been discussed before, but it always seemed that the
    cost-benefit ratio was exceedingly questionable.  You don't get any
    savings whatsoever unless you reduce the size of the null bitmap across
    a MAXALIGN boundary, which more and more often is 64 bits, so that the
    frequency with which the optimization wins anything doesn't look likely
    to be that high.
    There is the usage pattern where (brace yourself) people have
    thousands of columns in which they have all but a handful be null.
    They might be pretty happy about this.
    Oh, I don't doubt that there are *some* use cases for this. I'm just
    dubious about how much we'd be slowing things down for everybody else.
    As I said, what I'd like to see are some benchmarks, and not just
    benchmarks that are tuned to match the sort of case where this wins.

    regards, tom lane
  • Jameison Martin at Apr 17, 2012 at 8:56 pm
    Thanks for the response.



    The use-case I'm targeting is a schema that has multiple tables with ~800 columns, most of which have only the first 50 or so values set. 800 columns would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte alignment the row bitmap would take up 104 bytes with the current implementation. If only the first 50 or so columns are actually non-null, then the minimum bitmap size wouldn't need to be more than 8 bytes, which means the proposed change would save 96 bytes. For the data set I have in mind roughly 90% of the rows would fall into the category of needing only 8 bytes for the null bitmap.


    What kind of test results would prove that this is a net win (or not a net loss) for typical cases? Are you interested in some insert performance tests? Also, how would you define a typical case (e.g. what kind of data shape)?

    Thanks.
    -jamie


    ________________________________
    From: Tom Lane <tgl@sss.pgh.pa.us>
    To: Jameison Martin <jameisonb@yahoo.com>
    Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
    Sent: Tuesday, April 17, 2012 9:38 AM
    Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

    Jameison Martin <jameisonb@yahoo.com> writes:
    The following patch truncates trailing null attributes from heap rows to reduce the size of the row bitmap.
    This has been discussed before, but it always seemed that the
    cost-benefit ratio was exceedingly questionable.  You don't get any
    savings whatsoever unless you reduce the size of the null bitmap across
    a MAXALIGN boundary, which more and more often is 64 bits, so that the
    frequency with which the optimization wins anything doesn't look likely
    to be that high.  And on the other side of the coin, you're adding
    cycles to every single tuple-construction operation to make this work.
    The introduction of bugs doesn't seem improbable either.  (Just because
    tuples in user tables might have unexpected natts values doesn't mean
    that the code is, or should be, prepared to
    tolerate that in system
    tables or plan-constructed tuples.)

    So what I'd like to see is some concrete test results proving that this
    is a net win, or at least not a net loss, for typical cases.  Just
    asserting that it might be a win for certain usage patterns doesn't do
    it for me.

    regards, tom lane
  • Tom Lane at Apr 18, 2012 at 4:57 am

    Jameison Martin writes:
    The use-case I'm targeting is a schema that has multiple tables with ~800 columns, most of which have only the first 50 or so values set. 800 columns would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte alignment the row bitmap would take up 104 bytes with the current implementation. If only the first 50 or so columns are actually non-null, then the minimum bitmap size wouldn't need to be more than 8 bytes, which means the proposed change would save 96 bytes. For the data set I have in mind roughly 90% of the rows would fall into the category of needing only 8 bytes for the null bitmap.
    I can't help thinking that (a) this is an incredibly narrow use-case,
    and (b) you'd be well advised to rethink your schema design anyway.
    There are a whole lot of inefficiencies associated with having that many
    columns; the size of the null bitmap is probably one of the smaller
    ones. I don't really want to suggest an EAV design, but perhaps some of
    the columns could be collapsed into arrays, or something like that?
    What kind of test results would prove that this is a net win (or not a net loss) for typical cases? Are you interested in some insert performance tests? Also, how would you define a typical case (e.g. what kind of data shape)?
    Hmm, well, most of the tables I've seen have fewer than 64 columns, so
    that the probability of win is exactly zero. Which would mean that
    you've got to demonstrate that the added overhead is unmeasurably small.
    Which maybe you can do, because there's certainly plenty of cycles
    involved in a tuple insertion, but we need to see the numbers.
    I'd suggest an INSERT/SELECT into a temp table as probably stressing
    tuple formation speed the most. Or maybe you could write a C function
    that just exercises heap_form_tuple followed by heap_freetuple in a
    tight loop --- if there's no slowdown measurable in that context, then
    a fortiori we don't have to worry about it in the real world.

    regards, tom lane
  • Jameison Martin at Apr 18, 2012 at 5:50 am
    Regarding the schema: I'm afraid the schema cannot be changed at this point, though I appreciate
    the suggestions.

    Regarding an INSERT performance test, what kind of table shape would you like me to exercise?
    The patch as submitted may actually shave some cycles off of the insertion of rows with trailing nulls even
    when there are less than 64 columns because it avoids iterating over the null columns a 2nd time in heap_fill_tuple(),
    so I want to be sure that I pick something that you feel is properly representative.

    Thanks.

    -Jamie


    ________________________________
    From: Tom Lane <tgl@sss.pgh.pa.us>
    To: Jameison Martin <jameisonb@yahoo.com>
    Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
    Sent: Tuesday, April 17, 2012 9:57 PM
    Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

    Jameison Martin <jameisonb@yahoo.com> writes:
    The use-case I'm targeting is a schema that has multiple tables with ~800 columns, most of which have only the first 50 or so values set. 800 columns would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte alignment the row bitmap would take up 104 bytes with the current implementation. If only the first 50 or so columns are actually non-null, then the minimum bitmap size wouldn't need to be more than 8 bytes, which means the proposed change would save 96 bytes. For the data set I have in mind roughly 90% of the rows would fall into the category of needing only 8 bytes for the null bitmap.
    I can't help thinking that (a) this is an incredibly narrow use-case,
    and (b) you'd be well advised to rethink your schema design anyway.
    There are a whole lot of inefficiencies associated with having that many
    columns; the size of the null bitmap is probably one of the smaller
    ones.  I don't really want to suggest an EAV design, but perhaps some of
    the columns could be collapsed into arrays, or something like that?
    What kind of test results would prove that this is a net win (or not a net loss) for typical cases? Are you interested in some insert performance tests? Also, how would you define a typical case (e.g. what kind of data shape)?
    Hmm, well, most of the tables I've seen have fewer than 64 columns, so
    that the probability of win is exactly zero.  Which would mean that
    you've got to demonstrate that the added overhead is unmeasurably small.
    Which maybe you can do, because there's certainly plenty of cycles
    involved in a tuple insertion, but we need to see the numbers.
    I'd suggest an INSERT/SELECT into a temp table as probably stressing
    tuple formation speed the most.  Or maybe you could write a C function
    that just exercises heap_form_tuple followed by heap_freetuple in a
    tight loop --- if there's no slowdown measurable in that context, then
    a fortiori we don't have to worry about it in the real world.

    regards, tom lane
  • Jameison Martin at Apr 26, 2012 at 12:35 am
    Tom, I whipped up some  INSERT/SELECT tests where I selected into a temporary table as you suggested. The target temporary table and the source table were in cache and I basically disabled things that would cause noise. The source table had 5 integer columns, and was populated with 10 million rows.

    I tried 3 variations:
    1) target has all nullable columns, all set to non null values: the results were the same
    2) target has all nullable columns, only the first column is set: the patch was slightly faster
    3) target has all non-null columns: the patch maybe was slightly faster, probably not statistically relevant


    By slightly faster I'm talking on order of 10 nanoseconds per row.

    I think #2 is explained by the reduction in loop iterations in heap_fill_tuple().


    ________________________________
    From: Tom Lane <tgl@sss.pgh.pa.us>
    To: Jameison Martin <jameisonb@yahoo.com>
    Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
    Sent: Tuesday, April 17, 2012 9:57 PM
    Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

    Jameison Martin <jameisonb@yahoo.com> writes:
    The use-case I'm targeting is a schema that has multiple tables with ~800 columns, most of which have only the first 50 or so values set. 800 columns would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte alignment the row bitmap would take up 104 bytes with the current implementation. If only the first 50 or so columns are actually non-null, then the minimum bitmap size wouldn't need to be more than 8 bytes, which means the proposed change would save 96 bytes. For the data set I have in mind roughly 90% of the rows would fall into the category of needing only 8 bytes for the null bitmap.
    I can't help thinking that (a) this is an incredibly narrow use-case,
    and (b) you'd be well advised to rethink your schema design anyway.
    There are a whole lot of inefficiencies associated with having that many
    columns; the size of the null bitmap is probably one of the smaller
    ones.  I don't really want to suggest an EAV design, but perhaps some of
    the columns could be collapsed into arrays, or something like that?
    What kind of test results would prove that this is a net win (or not a net loss) for typical cases? Are you interested in some insert performance tests? Also, how would you define a typical case (e.g. what kind of data shape)?
    Hmm, well, most of the tables I've seen have fewer than 64 columns, so
    that the probability of win is exactly zero.  Which would mean that
    you've got to demonstrate that the added overhead is unmeasurably small.
    Which maybe you can do, because there's certainly plenty of cycles
    involved in a tuple insertion, but we need to see the numbers.
    I'd suggest an INSERT/SELECT into a temp table as probably stressing
    tuple formation speed the most.  Or maybe you could write a C function
    that just exercises heap_form_tuple followed by heap_freetuple in a
    tight loop --- if there's no slowdown measurable in that context, then
    a fortiori we don't have to worry about it in the real world.

    regards, tom lane
  • Simon Riggs at Apr 26, 2012 at 7:28 am

    On Thu, Apr 26, 2012 at 1:35 AM, Jameison Martin wrote:
    Tom, I whipped up some  INSERT/SELECT tests where I selected into a
    temporary table as you suggested. The target temporary table and the source
    table were in cache and I basically disabled things that would cause noise.
    The source table had 5 integer columns, and was populated with 10 million
    rows.

    I tried 3 variations:
    1) target has all nullable columns, all set to non null values: the
    results were the same
    2) target has all nullable columns, only the first column is set: the
    patch was slightly faster
    3) target has all non-null columns: the patch maybe was slightly faster,
    probably not statistically relevant

    By slightly faster I'm talking on order of 10 nanoseconds per row.

    I think #2 is explained by the reduction in loop iterations in
    heap_fill_tuple().
    I see this as a useful use case that I have come across in a few
    cases, most typically associated with very large databases.

    It will be a win in those cases, but I think your maths is unrealistic
    for the common case. In your case, you're saying that you have 750
    trailing null columns that will be all-NULL in 90% of cases. Given a
    randomly distributed set of col values, I'd expect the last NULL to be
    on average around the 400th column, perhaps more. So the savings are
    still high, but not as high in the general case as it is for you.

    The performance tests Tom asks for are essential, otherwise we cannot
    proceed. Thanks for starting those.

    Please post your test code, any environment notes and your exact test
    results. The important point is that we need objectively confirmable
    tests, not just your word it was faster. Everybody is held to the same
    level of proof here, so its not a personal doubt.

    It would be useful to post sizes of databases also, to confirm that
    the patch really does reduce database size.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Greg Stark at Apr 26, 2012 at 1:43 pm

    On Thu, Apr 26, 2012 at 8:27 AM, Simon Riggs wrote:
    The source table had 5 integer columns, and was populated with 10 million
    rows.
    ...
    2) target has all nullable columns, only the first column is set: the
    patch was slightly faster
    ...
    By slightly faster I'm talking on order of 10 nanoseconds per row.

    I think #2 is explained by the reduction in loop iterations in
    heap_fill_tuple().
    I see this as a useful use case that I have come across in a few
    cases, most typically associated with very large databases.
    Indeed, if this result holds up then I think that would be pretty
    convincing evidence. But I'm pretty skeptical. You're talking about
    five bitmap tests in the middle of a loop involving much more
    expensive steps. Can we see the raw numbers and the actual test case?

    What I think would be strong evidence it's a real effect is if you
    repeat the comparison with larger numbers and see the speedup scale
    up. For instance if you create a table with 100 nullable columns and
    one non-null column value stored in various columns. Is the difference
    between the runtimes for the 95th column and 100th column doubled when
    you compare the 90th and 100th column cases? And is it doubled again
    when you compare the 80th column and the 100th column cases? (Off the
    top of my head I think the null bitmap would take the same storage
    space for those four)

    --
    greg
  • Jameison Martin at Apr 26, 2012 at 6:59 pm
    Simon and Greg,

    The math on space savings is assuming that columns will be used roughly from first to last as declared in the DDL, not a random distribution of column values. This is the case for the particular schema that I'm looking at. I'm not asserting that it is the common case in general, though it may be more common than not given the fact that several commercial databases optimize for trailing null column values and developers often pay attention to this.

    If there is a exact standard as to how this group does performance analysis (e.g. removing outliers beyond a certain standard deviation, number of repetitions, machine isolation requirements and so forth), please let me know. I can submit my results as is but in the interest of avoiding a lot of duplicate postings perhaps someone can point me to an example of what kinds of numbers are desired so I can make sure my posting conforms to that. For what it is worth I ran the 3 tests 10 times each and removed the outliers, but I can run 100 times or do something different if need be (e.g. post a csv for easy consumption in a spreadsheet). Also, Simon, you mentioned posting "environment notes", can you let me know what kind of environment notes are desired? For example, are you thinking about changes to the vanilla postgresql.conf, hardware information, OS config, etc?

    Greg, all I'm trying to establish is that this change doesn't hurt insert performance for the common case as per Tom's comments. I'll try to add some additional test cases with varying trailing null column values to see if we can establish the potential salutary effect with a bit more data, but I'm not actually asserting that this significant or is a justification for the patch. It would be interesting to see what the performance benefit is with real queries against rows that have much smaller bitmaps, but I'd prefer not to get into that.

    As for proof of the size reduction, I'd actually like to codify something in a regression test to ensure there are no regressions in the behavior of the patch. I was a little leery of creating a regression test that is dependent on internals that might cause the test to break over time, so I punted on it. Does anyone have a good suggestion as to a safe way to codify that the proposed behavioral change is working as intended in the form of a test that is unlikely to break over time? The best thing I could come up with was to create a very wide table and insert some sparse rows (trailing nulls) and verify that the pages. In any event, I'll also post a comparative relation size number and test as well.

    Cheers.

    -Jamie


    ________________________________
    From: Simon Riggs <simon@2ndQuadrant.com>
    To: Jameison Martin <jameisonb@yahoo.com>
    Cc: Tom Lane <tgl@sss.pgh.pa.us>; "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
    Sent: Thursday, April 26, 2012 12:27 AM
    Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
    On Thu, Apr 26, 2012 at 1:35 AM, Jameison Martin wrote:
    Tom, I whipped up some  INSERT/SELECT tests where I selected into a
    temporary table as you suggested. The target temporary table and the source
    table were in cache and I basically disabled things that would cause noise.
    The source table had 5 integer columns, and was populated with 10 million
    rows.

    I tried 3 variations:
    1) target has all nullable columns, all set to non null values: the
    results were the same
    2) target has all nullable columns, only the first column is set: the
    patch was slightly faster
    3) target has all non-null columns: the patch maybe was slightly faster,
    probably not statistically relevant

    By slightly faster I'm talking on order of 10 nanoseconds per row.

    I think #2 is explained by the reduction in loop iterations in
    heap_fill_tuple().
    I see this as a useful use case that I have come across in a few
    cases, most typically associated with very large databases.

    It will be a win in those cases, but I think your maths is unrealistic
    for the common case. In your case, you're saying that you have 750
    trailing null columns that will be all-NULL in 90% of cases. Given a
    randomly distributed set of col values, I'd expect the last NULL to be
    on average around the 400th column, perhaps more. So the savings are
    still high, but not as high in the general case as it is for you.

    The performance tests Tom asks for are essential, otherwise we cannot
    proceed. Thanks for starting those.

    Please post your test code, any environment notes and your exact test
    results. The important point is that we need objectively confirmable
    tests, not just your word it was faster. Everybody is held to the same
    level of proof here, so its not a personal doubt.

    It would be useful to post sizes of databases also, to confirm that
    the patch really does reduce database size.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Jameison Martin at May 2, 2012 at 6:07 pm
    Attached are the following as per various requests:
    * test_results.txt: the performance benchmarking results,

    * TestTrailingNull.java: the performance benchmarking code, with a few additional scenarios as per various requests

    * hardinfo_report.txt: some information about the hardware and OS of the system on which the benchmarks were run, and

    * postgresql.conf: the postgresql.conf used when running benchmarks. Note that the changes made to the vanilla postgresql.conf can be identified by looking for the string 'jamie' in the file I attached (there aren't that many)
    I ran the tests against a recent pull from git that I made a week ago or so, both with and without my patch. The results are marked as BASELINE (without my patch) and PATCH (with my patch). As I mentioned previously, I took Tom's advice and ran INSERT SELECT into a temporary table to get some idea of the impact of the proposed patch on the INSERT codepath. The DDL that the test ran is stated in the results along with the time the test took and the size of the target table. The INSERT SELECT always inserted 10 million rows per iteration.  I mostly focused on smaller schemas to address Tom's concerns. I also added some wider schemas as per Simon and Greg. Note that the smaller schema runs fit in memory whereas the wider ones did not necessarily fit in memory; the wider schemas are primarily intended to clearly demonstrate the space savings.

    When inserting rows with trailing nulls the patch always improves insert performance. Row size is decreased when the row bitmap can be truncated to something smaller. I'm not seeing a performance degradation without trailing nulls. I'm not asserting that the performance improvement justifies the change, just that the patch can have a significant impact on row size in the scenarios that I have outlined in previous emails (800 nullable columns with only the first 50 set). The fact that it improves insert performance in some cases is gravy in my opinion because this is a micro benchmark and we aren't talking about significant performance differences (in general we are talking about nanoseconds per row).

    Hopefully the test output and the code is pretty self-explanatory.

    If anyone wants to run TestTrailingNull.java for themselves you'll need the postgres jdbc driver and junit in your classpath.

    Thanks.

    -Jamie




    ________________________________
    From: Jameison Martin <jameisonb@yahoo.com>
    To: Simon Riggs <simon@2ndQuadrant.com>
    Cc: Tom Lane <tgl@sss.pgh.pa.us>; "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
    Sent: Thursday, April 26, 2012 11:59 AM
    Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap


    Simon and Greg,

    The math on space savings is assuming that columns will be used roughly from first to last as declared in the DDL, not a random distribution of column values. This is the case for the particular schema that I'm looking at. I'm not asserting that it is the common case in general, though it may be more common than not given the fact that several commercial databases optimize for trailing null column values and developers often pay attention to this.

    If there is a exact standard as to how this group does performance analysis (e.g. removing outliers beyond a certain standard deviation, number of repetitions, machine isolation requirements and so forth), please let me know. I can submit my results as is but in the interest of avoiding a lot of duplicate postings perhaps someone can point me to an example of what kinds of numbers are desired so I can make sure my posting conforms to that. For what it is worth I ran the 3 tests 10 times each and removed the outliers, but I can run 100 times or do something different if need be (e.g. post a csv for easy consumption in a spreadsheet). Also, Simon, you mentioned posting "environment notes", can you let me know what kind of environment notes are desired? For example, are you thinking about changes to the vanilla postgresql.conf, hardware information, OS config, etc?

    Greg, all I'm trying to establish is that this change doesn't hurt insert performance for the common case as per Tom's comments. I'll try to add some additional test cases with varying trailing null column values to see if we can establish the potential salutary effect with a bit more data, but I'm not actually asserting that this significant or is a justification for the patch. It would be interesting to see what the performance benefit is with real queries against rows that have much smaller bitmaps, but I'd prefer not to get into that.

    As for proof of the size reduction, I'd actually like to codify something in a regression test to ensure there are no regressions in the behavior of the patch. I was a little leery of creating a regression test that is dependent on internals that might cause the test to break over time, so I punted on it. Does anyone have a good suggestion as to a safe way to codify that the proposed behavioral change is working as intended in the form of a test that is unlikely to break over time? The best thing I could come up with was to create a very wide table and insert some sparse rows (trailing nulls) and verify that the pages. In any event, I'll also post a comparative relation size number and test as well.

    Cheers.

    -Jamie


    ________________________________
    From: Simon Riggs <simon@2ndQuadrant.com>
    To: Jameison Martin <jameisonb@yahoo.com>
    Cc: Tom Lane <tgl@sss.pgh.pa.us>; "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
    Sent: Thursday, April 26, 2012 12:27 AM
    Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
    On Thu, Apr 26, 2012 at 1:35 AM, Jameison Martin wrote:
    Tom, I whipped up some  INSERT/SELECT tests where I selected into a
    temporary table as you suggested. The target temporary table and the source
    table were in cache and I basically disabled things that would cause noise.
    The source table had 5 integer columns, and was populated with 10 million
    rows.

    I tried 3 variations:
    1) target has all nullable columns, all set to non null values: the
    results were the same
    2) target has all nullable columns, only the first column is set: the
    patch was slightly faster
    3) target has all non-null columns: the patch maybe was slightly faster,
    probably not statistically relevant

    By slightly faster I'm talking on order of
    10 nanoseconds per row.
    I think #2 is explained by the reduction in loop iterations in
    heap_fill_tuple().
    I see this as a useful use case that I have come across in a few
    cases, most typically associated with very large databases.

    It will be a win in those cases, but I think your maths is unrealistic
    for the common case. In your case, you're saying that you have 750
    trailing null columns that will be all-NULL in 90% of cases. Given a
    randomly distributed set of col values, I'd expect the last NULL to be
    on average around the 400th column, perhaps more. So the savings are
    still high, but not as high in the general case as it is for you.

    The performance tests Tom asks for are essential, otherwise we cannot
    proceed. Thanks for starting those.

    Please post your test code, any environment notes and your exact test
    results. The important point is that we need objectively
    confirmable
    tests, not just your word it was faster. Everybody is held to the same
    level of proof here, so its not a personal doubt.

    It would be useful to post sizes of databases also, to confirm that
    the patch really does reduce database size.

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Josh Berkus at May 3, 2012 at 1:01 am

    On 5/2/12 10:20 AM, Jameison Martin wrote:
    Attached are the following as per various requests:
    * test_results.txt: the performance benchmarking results,

    * TestTrailingNull.java: the performance benchmarking code, with a few additional scenarios as per various requests

    * hardinfo_report.txt: some information about the hardware and OS of the system on which the benchmarks were run, and

    * postgresql.conf: the postgresql.conf used when running benchmarks. Note that the changes made to the vanilla postgresql.conf can be identified by looking for the string 'jamie' in the file I attached (there aren't that many)
    Nice, thanks. I'll try some of my own tests when I get a chance; I have
    a really good use-case for this optimization.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Josh Berkus at May 3, 2012 at 6:52 pm
    Tom,

    So that I can test this properly, what is the specific use-case we'd
    expect to be slow with this patch?

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Robert Haas at Jun 26, 2012 at 9:04 pm

    On Wed, May 2, 2012 at 9:01 PM, Josh Berkus wrote:
    On 5/2/12 10:20 AM, Jameison Martin wrote:
    Attached are the following as per various requests:
    * test_results.txt: the performance benchmarking results,

    * TestTrailingNull.java: the performance benchmarking code, with a few additional scenarios as per various requests

    * hardinfo_report.txt: some information about the hardware and OS of the system on which the benchmarks were run, and

    * postgresql.conf: the postgresql.conf used when running benchmarks. Note that the changes made to the vanilla postgresql.conf can be identified by looking for the string 'jamie' in the file I attached (there aren't that many)
    Nice, thanks.  I'll try some of my own tests when I get a chance; I have
    a really good use-case for this optimization.
    Josh,

    The CommitFest application lists you as the reviewer for this patch.
    Are you (I hope) planning to review it?

    I see you posted up a follow-up email asking Tom what he had in mind.
    Personally, I don't think this needs incredibly complicated testing.
    I think you should just test a workload involving inserting and/or
    updating rows with lots of trailing NULL columns, and then another
    workload with a table of similar width that... doesn't. If we can't
    find a regression - or, better, we find a win in one or both cases -
    then I think we're done here.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Josh Berkus at Apr 27, 2012 at 12:51 am
    Tom,
    I can't help thinking that (a) this is an incredibly narrow use-case,
    and (b) you'd be well advised to rethink your schema design anyway.
    It's more common than you'd think. Both EAV and Hstore have their own
    (severe) drawbacks.

    For example, I'm working on an application which collects telemetry data
    from hardware. This can involve up to 700 columns of data, most of
    which is booleans, and an awful lot of which is NULL.

    Also, adding lots of columns *is* following "proper" relational design
    like we urge users to do, so it would be nice to make it perfomant.

    Now, the other issue I'd be worried about for this optimization is what
    happens when the nulls become non-trailing? For example, this pattern:

    1. Out of 700 columns, columns 301+ are all Null, so we map them away.
    2. User updates column 688 to non-null
    3. Suddenly we have a MUCH larger row which will no longer fit on the page.

    If your application had a lot of that kind of update pattern, I'd be
    concerned that this would be a deoptimzation.
    If there is a exact standard as to how this group does performance
    analysis (e.g. removing outliers beyond a certain standard deviation,
    number of repetitions, machine isolation requirements and so forth),
    please let me know.
    Oh, don't I wish! We're a lot more "cowboy" that that. Greg Smith and
    Mark Wong have been trying to build a performance testing
    infrastructure, but right now our test software and methodology is
    *very* primitive. You're welcome to help and suggest.
    I can submit my results as is but in the interest
    of avoiding a lot of duplicate postings perhaps someone can point me
    to an example of what kinds of numbers are desired so I can make sure
    my posting conforms to that. For what it is worth I ran the 3 tests
    10 times each and removed the outliers, but I can run 100 times or do
    something different if need be (e.g. post a csv for easy consumption
    in a spreadsheet).
    Actually, I think just doing a battery of pgbench tests, for both the
    bigger and smaller than memory cases, with the patch installed, would
    give us some results for the non-NULL case. Something more
    sophisticated like DVDstore or DBT2 would be even better, since the
    tables there have more columns.
    I tried 3 variations:
    1) target has all nullable columns, all set to non null values: the
    results were the same
    2) target has all nullable columns, only the first column is set:
    the patch was slightly faster
    3) target has all non-null columns: the patch maybe was slightly
    faster, probably not statistically relevant

    This seems pretty on-target; can you share the numbers, the nature of
    the test, and the setup with us so that we can evaulate it?
    Also, Simon, you mentioned posting "environment
    notes", can you let me know what kind of environment notes are
    desired? For example, are you thinking about changes to the vanilla
    postgresql.conf, hardware information, OS config, etc?
    Yes, exactly.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Simon Riggs at Apr 27, 2012 at 10:06 am

    On Fri, Apr 27, 2012 at 1:51 AM, Josh Berkus wrote:

    Now, the other issue I'd be worried about for this optimization is what
    happens when the nulls become non-trailing?  For example, this pattern:

    1. Out of 700 columns, columns 301+ are all Null, so we map them away.
    2. User updates column 688 to non-null
    3. Suddenly we have a MUCH larger row which will no longer fit on the page.

    If your application had a lot of that kind of update pattern, I'd be
    concerned that this would be a deoptimzation.
    Currently, we have a long row before and a long row after. Jamie's
    proposals would give us a short row before and a long row after.

    Since we don't ever update in place, we're much more likely to fit on
    the same page with this optimisation than without it. I guess we can
    check that with a performance test.

    (Perhaps a more obvious optimisation would be to use a compressed NULL
    bitmap. That would respond better in a wider range of use cases than
    just truncation of trailing zeroes.)

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Greg Stark at Apr 28, 2012 at 10:23 pm

    On Fri, Apr 27, 2012 at 1:51 AM, Josh Berkus wrote:
    1. Out of 700 columns, columns 301+ are all Null, so we map them away.
    2. User updates column 688 to non-null
    3. Suddenly we have a MUCH larger row which will no longer fit on the page.
    Note that this is only actually 48 bytes more in the null bitmap in
    this scenario. That's part of Tom's point that the null bitmap is so
    dense that you have to be talking about some pretty huge number of
    columns before the size savings are noticeable. Saving 48 bytes is
    nothing to sneeze at but it's hardly an impractical update to handle.

    --
    greg
  • Robert Haas at Apr 28, 2012 at 11:24 pm

    On Sat, Apr 28, 2012 at 6:23 PM, Greg Stark wrote:
    On Fri, Apr 27, 2012 at 1:51 AM, Josh Berkus wrote:
    1. Out of 700 columns, columns 301+ are all Null, so we map them away.
    2. User updates column 688 to non-null
    3. Suddenly we have a MUCH larger row which will no longer fit on the page.
    Note that this is only actually 48 bytes more in the null bitmap in
    this scenario. That's part of Tom's point that the null bitmap is so
    dense that you have to be talking about some pretty huge number of
    columns before the size savings are noticeable. Saving 48 bytes is
    nothing to sneeze at but it's hardly an impractical update to handle.
    More to the point, if the old row were 48 bytes larger, that would not
    increase the chance of the new row fitting on the page. You've got to
    store the old and new row no matter what: if the old one can be made
    smaller than otherwise, that's a win regardless of whether the new one
    is also smaller or not.

    The other point I feel we're overlooking here is... according to
    Jamie, the patch actually made things FASTER in every case he thought
    to test, and those cases don't appear to be anything particularly
    favorable to the patch, so that's probably a good sign. I'd like to
    see the exact numbers from each test run, and a complete reproducible
    test case, but at present all signs seem to point to this change being
    a savings in both time and space. Let's not go looking for reasons to
    reject the approach just because we didn't expect it to work as well
    as it does.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Simon Riggs at Apr 29, 2012 at 11:19 am

    On Sun, Apr 29, 2012 at 12:24 AM, Robert Haas wrote:

    Let's not go looking for reasons to
    reject the approach just because we didn't expect it to work as well
    as it does.
    Who here, in your opinion, is looking for reasons to reject anything?

    --
    Simon Riggs                   http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Gokulakannan Somasundaram at Apr 29, 2012 at 11:51 am
    There might be a patch available for this already. In the worst case
    articulated above (less than 64 columns), if all the nulls are trailing
    nulls, the bitmap need not be saved. Actually it is not 64(actually 72), as
    postgres heaptupleheader is only 23 bytes and one byte is left for the
    start of the bitmap.

    The same principle can be considered for Index Tuple as an extension

    Thanks,
    Gokul.
  • Robert Haas at Apr 29, 2012 at 9:02 pm

    On Sun, Apr 29, 2012 at 7:19 AM, Simon Riggs wrote:
    On Sun, Apr 29, 2012 at 12:24 AM, Robert Haas wrote:
    Let's not go looking for reasons to
    reject the approach just because we didn't expect it to work as well
    as it does.
    Who here, in your opinion, is looking for reasons to reject anything?
    I'm just saying that there seems to be a bit more skepticism here than
    can be justified considering that the test results are all on one
    side. It wouldn't take a lot of evidence to convince me that this is
    a bad idea, but it will take more than none, which is the amount we
    have now.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Simon Riggs at Aug 9, 2012 at 10:14 am

    On 17 April 2012 17:22, Jameison Martin wrote:

    The following patch truncates trailing null attributes from heap rows to
    reduce the size of the row bitmap.
    The intuition for this change is that ALTER TABLE t ADD COLUMN c type NULL
    is a metadata only change. Postgres works fine when a row's metadata (tuple
    descriptor) is inconsistent with the actual row data: extra columns are
    assumed to be null. This change just adjusts the number of attributes for a
    row and the row bitmap to only track up to the last non-null attribute.
    This is an interesting patch, but its has had various comments made about it.

    When I look at this I see that it would change the NULL bitmap for all
    existing rows, which means it forces a complete unload/reload of data.
    We've moved away from doing things like that, so in its current form
    we'd probably want to reject that.

    If I might suggest a way forward?

    Keep NULL bitmaps as they are now. Have another flag which indicates
    when a partial trailing col trimmed NULL bitmap is in use. Then we can
    decide whether a table will benefit from full or partial bitmap and
    set that in the tupledesc. That way the tupledesc will show
    heap_form_tuple which kind of null bitmap is preferred for new tuples.
    That preference might be settable by user on or off, but the default
    would be for postgres to decide that for us based upon null stats etc,
    which we would decide at ANALYZE time.

    That mechanism is both compatible with existing on-disk formats and
    means that the common path for smaller tables is unaffected, yet we
    gain the benefit of the patch for larger tables.

    It would be good to see you take this all the way.

    --
    Simon Riggs http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Tom Lane at Aug 9, 2012 at 2:27 pm

    Simon Riggs writes:
    On 17 April 2012 17:22, Jameison Martin wrote:
    The following patch truncates trailing null attributes from heap rows to
    reduce the size of the row bitmap.
    This is an interesting patch, but its has had various comments made about it.
    When I look at this I see that it would change the NULL bitmap for all
    existing rows, which means it forces a complete unload/reload of data.
    Huh? I thought it would only change how *new* tuples were stored.
    Old tuples ought to continue to work fine.

    I'm not really convinced that it's a good idea in the larger scheme
    of things --- your point in a nearby thread that micro-optimizing
    storage space at the expense of all else is not good engineering
    applies here. But I don't see that it forces data reload. Or if
    it does, that should be easily fixable.
    ... Have another flag which indicates
    when a partial trailing col trimmed NULL bitmap is in use.
    That might be useful for forensic purposes, but on the whole I suspect
    it's just added complexity (and eating up a valuable infomask bit)
    for relatively little gain.
    ... decide whether a table will benefit from full or partial bitmap and
    set that in the tupledesc. That way the tupledesc will show
    heap_form_tuple which kind of null bitmap is preferred for new tuples.
    That preference might be settable by user on or off, but the default
    would be for postgres to decide that for us based upon null stats etc,
    which we would decide at ANALYZE time.
    And that seems like huge overcomplication. I think we could probably
    do fine with some very simple fixed policy, like "don't bother with
    this for tables of less than N columns", where N is maybe 64 or so
    and chosen to match the MAXALIGN boundary where there actually could
    be some savings from trimming the null bitmap.

    (Note: I've not read the patch, so maybe Jameison already did something
    of the sort.)

    regards, tom lane
  • Simon Riggs at Aug 9, 2012 at 2:49 pm

    On 9 August 2012 15:27, Tom Lane wrote:
    Simon Riggs <simon@2ndQuadrant.com> writes:
    On 17 April 2012 17:22, Jameison Martin wrote:
    The following patch truncates trailing null attributes from heap rows to
    reduce the size of the row bitmap.
    This is an interesting patch, but its has had various comments made about it.
    When I look at this I see that it would change the NULL bitmap for all
    existing rows, which means it forces a complete unload/reload of data.
    Huh? I thought it would only change how *new* tuples were stored.
    Old tuples ought to continue to work fine.
    That wasn't my understanding, but that could be wrong.
    I'm not really convinced that it's a good idea in the larger scheme
    of things --- your point in a nearby thread that micro-optimizing
    storage space at the expense of all else is not good engineering
    applies here. But I don't see that it forces data reload. Or if
    it does, that should be easily fixable.
    Large numbers of columns are surprisingly common and tables with large
    numbers of columns usually have many rows as well. So this doesn't
    matter for most tables, but the few that need this can often represent
    80% of database volume, so it is important.
    (Next challenge is how to cope with 1000s of columns.)
    And that seems like huge overcomplication. I think we could probably
    do fine with some very simple fixed policy, like "don't bother with
    this for tables of less than N columns", where N is maybe 64 or so
    and chosen to match the MAXALIGN boundary where there actually could
    be some savings from trimming the null bitmap.
    "One simple tweak" works for me.

    --
    Simon Riggs http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
  • Jameison Martin at Aug 9, 2012 at 3:56 pm
    Simon, Tom is correct, the patch doesn't change the existing row format contract or the format of the null bitmap. The change only affects how new rows are written out. And it uses the same supported format that has always been there (which is why alter table add col null works the way it does). And it keeps to the same MAXALIGN boundaries that are there today.

    One could argue that different row formats could make sense in different circumstances, and I'm certainly open to that kind of discussion, but this change is far more modest and perhaps can be made on its own since it doesn't perturb the code base much, improves performance (marginally) and improves the size of rows with lots of trailing nulls.

    [separate topic: pluggable heap manager]
    I'm quite interested in pursuing more aggressive compression strategies, and I'd like to do so in the context of the heap manager. I'm exploring having a pluggable heap manager implementation and would be interested in feedback on that as a general approach. My thinking is that I'd like to be able to have PostgreSQL support multiple heap implementations along the lines of how multiple index types are supported, though probably only the existing heap manager implementation would be part of the actual codeline. I've done a little exploratory work of looking at the heap interface. I was planning on doing a little prototyping before suggesting anything concrete, but, assuming the concept of a layered heap manager is not inherently objectionable, I was thinking of cleaning up the heap interface a little (e.g. some HOT stuff has bled across a little), then taking a whack at formalizing the interface along the lines of the index layering. So ideally I'd make a
    few separate submissions and if all goes according to plan I'd be able to have a pluggable heap manager implementation that I could work on independently and which could in theory use the same hooks as the existing heap implementation. And if it turns out that my implementation is deemed to be general enough it could be released to the community.

    If I do decide to pursue this, can anyone suggest the best way solicit feedback? I see that some proposals get shared on the postgres wiki. I could put something up there to frame the issue and encourage some back and forth dialog. Or is email the way that this kind of exchange tends to happen? Ultimately I'd like to get into a bit of detail about what the actual heap manager contract is and so forth.

    Note that I'm a ways from really knowing if this is feasible on my end, so this is quite speculative at this point. But I'd like to introduce the topic and get some feedback on the right way to communicate as early as possible.

    Thanks.

    -Jamie


    ________________________________
    From: Tom Lane <tgl@sss.pgh.pa.us>
    To: Simon Riggs <simon@2ndQuadrant.com>
    Cc: Jameison Martin <jameisonb@yahoo.com>; "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
    Sent: Thursday, August 9, 2012 7:27 AM
    Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

    Simon Riggs <simon@2ndQuadrant.com> writes:
    On 17 April 2012 17:22, Jameison Martin wrote:
    The following patch truncates trailing null attributes from heap rows to
    reduce the size of the row bitmap.
    This is an interesting patch, but its has had various comments made about it.
    When I look at this I see that it would change the NULL bitmap for all
    existing rows, which means it forces a complete unload/reload of data.
    Huh?  I thought it would only change how *new* tuples were stored.
    Old tuples ought to continue to work fine.

    I'm not really convinced that it's a good idea in the larger scheme
    of things --- your point in a nearby thread that micro-optimizing
    storage space at the expense of all else is not good engineering
    applies here.  But I don't see that it forces data reload.  Or if
    it does, that should be easily fixable.
    ...  Have another flag which indicates
    when a partial trailing col trimmed NULL bitmap is in use.
    That might be useful for forensic purposes, but on the whole I suspect
    it's just added complexity (and eating up a valuable infomask bit)
    for relatively little gain.
    ... decide whether a table will benefit from full or partial bitmap and
    set that in the tupledesc. That way the tupledesc will show
    heap_form_tuple which kind of null bitmap is preferred for new tuples.
    That preference might be settable by user on or off, but the default
    would be for postgres to decide that for us based upon null stats etc,
    which we would decide at ANALYZE time.
    And that seems like huge overcomplication.  I think we could probably
    do fine with some very simple fixed policy, like "don't bother with
    this for tables of less than N columns", where N is maybe 64 or so
    and chosen to match the MAXALIGN boundary where there actually could
    be some savings from trimming the null bitmap.

    (Note: I've not read the patch, so maybe Jameison already did something
    of the sort.)

    regards, tom lane
  • Jim Nasby at Aug 9, 2012 at 10:58 pm

    On 8/9/12 10:56 AM, Jameison Martin wrote:
    [separate topic: pluggable heap manager]
    I'm quite interested in pursuing more aggressive compression strategies, and I'd like to do so in the context of the heap manager. I'm exploring having a pluggable heap manager implementation and would be interested in feedback on that as a general approach. My thinking is that I'd like to be able to have PostgreSQL support multiple heap implementations along the lines of how multiple index types are supported, though probably only the existing heap manager implementation would be part of the actual codeline. I've done a little exploratory work of looking at the heap interface. I was planning on doing a little prototyping before suggesting anything concrete, but, assuming the concept of a layered heap manager is not inherently objectionable, I was thinking of cleaning up the heap interface a little (e.g. some HOT stuff has bled across a little), then taking a whack at formalizing the interface along the lines of the index layering. So ideally I'd make a few separate
    submissions and if all goes according to plan I'd be able to have a pluggable heap manager implementation that I could work on independently and which could in theory use the same hooks as the existing heap implementation. And if it turns out that my implementation is deemed to be general enough it could be released to the community.
    I'm definitely interested in things that can shrink our working-set-size; things that others might not be keen on. (Like having the on-disk format be tighter than the in-memory one). Having the ability to put in different heap storage could be a good way to accommodate that. Especially if you could change it on a per-table basis.
    --
    Jim C. Nasby, Database Architect jim@nasby.net
    512.569.9461 (cell) http://jim.nasby.net
  • Tom Lane at Aug 10, 2012 at 12:06 am

    Jameison Martin writes:
    [separate topic: pluggable heap manager]
    I'm quite interested in pursuing more aggressive compression
    strategies, and I'd like to do so in the context of the heap
    manager. I'm exploring having a pluggable heap manager implementation
    and would be interested in feedback on that as a general approach. My
    thinking is that I'd like to be able to have PostgreSQL support
    multiple heap implementations along the lines of how multiple index
    types are supported, though probably only the existing heap manager
    implementation would be part of the actual codeline.
    There's been some previous talk about "pluggable heap managers"; you
    might try searching our archives. Unfortunately the story is not very
    good, especially if what you are really interested in is playing games
    with the format of heap tuples. We just haven't got an abstraction
    boundary that isolates that very well. I think the first thing that
    would have to be worked out before we could even discuss having multiple
    heap managers is where the abstraction boundary would get drawn and how
    we'd have to refactor existing code to make it fly.
    If I do decide to pursue this, can anyone suggest the best way solicit
    feedback?
    Usually people just post proposals for discussion on pgsql-hackers.
    Obviously, at the early stages such a proposal might be pretty vague,
    but that's fine as long as you can explain what kind of feedback you're
    hoping for.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedApr 17, '12 at 4:23p
activeAug 10, '12 at 12:06a
posts28
users8
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase