there have been a lot of different threads on this patch. i'm going to take a stab at teasing them out, summarizing them, and addressing them individually.
here is my categorization of the issues that have been raised about this patch:
1. is there really anyone that would actually benefit from this change?
2. what is the performance impact on more canonical use-cases (e.g. no trailing nulls)?
1. look at the impact on a micro benchmark of insert performance (Tom's original suggestion)
2. look at the impact on pgbench (the recent thread on this)
3. what is statistically relevant when interpreting benchmarks (Noah's thoughtful comments)?
3. what is the impact on the use-case this is meant to address?
1. space impact
2. performance impact
3. is this impact relevant ("attractive" in Tom's latest email)
* is the proposed change itself problematic?
1. is it overly complicated?
2. is there a more generalized approach that would be better (e.g. compress the bitmap)?
3. does it violate a principle in the code (Tom's latest email)?
1) is there really anyone (other than me) that would benefit from this change?
this is a tough one since it is so subjective. i mentioned a particular real world case of mine where this patch as is makes a very significant improvement. others have chimed in with examples where they have seen this in the wild (Josh Berkus and his telemetry example, and Greg Stark mentioned seeing patterns where users have wide schemas and trailing nulls though i'm not sure about the context). that said, it is hard to turn these examples into a generalized notion of how applicable it might be.
i think the primary beneficiaries of this change are enterprise application vendors (though there are clearly other users that would benefit like Josh). enterprise applications often need to deal with customizable fields in their object model. they typically model this with very wide schemas with a lot of generic nullable columns that have no semantic meaning until the customer decides to 'create' a custom field. i honestly suspect that this is actually a fairly common pattern with enterprise applications, though it is hard to say how big of a constituency this may be in the Postgres community. application developers at enterprise software vendors are often very savvy database users and generally know exactly how the database lays out rows and will consequently optimize their use of their wide schemas to minimize the space and performance impact of their custom fields. on other commercial databases trailing nulls are 'free' so enterprise application
developers carefully make use of generic columns designated as custom fields by using the first ones first. they may even 'recycle' columns in crafty ways to ensure they are using early columns as much as possible.
i can give more detail if folks would like to hear more about this.
also, i know there have been some suggestions about different ways to approach this issue of customized columns without without using such wide schemas. i can assure you that they are all even more problematic (generally in terms of performance). we can discuss this further as well. but i guess the main point here isn't whether this is the 'right' approach to application development, but that it is what people in fact are doing.
in summary, it is difficult to quantify how often Postgres users have wide schemas with trailing nulls, but i believe it might be a fairly common pattern with enterprise software vendors that support customization, and it may even show up a bit in end-user applications as well (e.g. Josh's example).
2) what is the performance impact on more canonical use-cases (e.g. no trailing nulls)?
a number of people (Tom, Simon, Kevin etc) have expressed a concern that any improvement to the particular use-case this patch is meant to address shouldn't negatively impact the performance of the more canonical use cases (e.g. no trailing nulls). i think there is a broad consensus on this (i agree as well). there are 3 sets of benchmark data addressing this concern. i generated a bunch of numbers doing a very specific micro-benchmark that attempts to isolate insert performance as suggested by Tom. that showed no performance degradation. Amit did some additional benchmarking. and finally there was a recent run of pgbench by Amit that originally showed a 1% degradation and a more recent run with 5 runs of both that showed a 0.2% degradation. [i'm very grateful to Amit for all of the work he has done on behalf of this, i'm sure it was more than he bargained for]
since the pgbench numbers were the ones that caused the committed patch to be backed out it makes sense to address this at length. i agree with Noah's comments that maybe 1% performance changes shouldn't necessarily be treated as significant because of the inherent variance in test runs (stdev), because of the necessarily stricter and potentially onerous requirements this would likely impose on benchmarking in general (e.g. potentially require the benchmarker to flush the filesystem across runs, isolate the machine from the network, follow some rules for proper statical analysis of results, etc), and because even the changing size of a binary can have and a demonstrable impact some benchmarks. [parenthetically it is probably a good idea for someone to take a stab and formalizing the interpretation of pgbench results (what degree of isolation is required and so forth, thoughts about variance and so forth). if folks think this is a good idea a colleague is
willing to take a stab at this]
nonetheless, it seemed reasonable to me to look into Amit's original numbers more closely. Amit's more recent 5 iteration run number showed a ~0.2% performance degradation with the patch. that is arguably statistically irrelevant but it did seemed fairly repeatable, so a colleague of mine was kind enough to look into it. instead of trying to formally explain or eliminate this degradation he made a slight adjustment to the order of if/else in the patch and and now the patch seems marginally faster in pgbench (though arguably statistically irrelevant). i have attached this latest version of the patch and have included his pgbench numbers below.
so, i personally believe that the patch doesn't have a negative performance impact on the more canonical use-cases. there is a pretty significant amount of benchmarking to show this, and this makes intuitive sense if you examine the patch. so i believe the original reason for backing out the patch has been remedied by more extensive pgbench runs to remove a bit more noise and by a tiny adjustment in the code to make a minuscule performance improvement.
and, just to be clear, i'm not claiming that the patch actually improves the performance of canonical use cases. and i don't believe that anyone suggested that improving the performance of canonical use cases is a necessary requirement for this patch, but i might be mistaken about that.
3) what is the impact on the use-case this is meant to address?
i have produced some numbers that show significant space savings and performance improvements for the real world use-cases that i'm seeking to address. Amit and i have also demonstrated the impact of this change across a number of other scenarios (maybe 10 different scenarios in total). in the particular real world use-case i'm trying to address -- a table with 800+ nullable columns of which only 50 are set on average -- the impact is significant. there is a ~20% performance improvement on insert microbenchmarks and, more importantly to me, ~100 bytes saved per row on average. that is certainly relevant to me. given the challenges of achieving high page utilization with the mvcc/vacuuming model it might be argued that 100 bytes per row isn't important but it adds up in my case.
but of course many people have pointed out that the impact of this change is roughly proportional to the number of trailing nulls we are talking about. this comes back to the original question of whether the benefit is general enough. it is hard for me to assess whether the impact is relevant for other use cases without a clear statement of those other use cases. Josh may have an opinion on his particular use case.
4) is the proposed change itself problematic?
the change complexity vs. the original code complexity is pretty subjective. the change basically just uses the number of non-null attributes in the row to format the row header instead of the tuple descriptor's natts. i obviously don't find the change excessively complicated but i have bias. i think the only thing to do is for individuals to look at the patch and see if the code has grown a more complicated or too complicated.
4.2) is there a more generalized approach that would be better (e.g. compress the bitmap)
i think it was Simon that suggested perhaps a more general approach such as compressing the bitmap might be better because it might be more widely applicable. this is certainly a valid point. ironically i took the simplistic approach of truncating the bitmap expressly because of its simplicity, and because i thought there might be a decent chance that a more complicated change might be rejected on the basis of the complexity alone (if nothing else there would be two bitmap formats that would need to be supported assuming no upgrade). that said, it would be interesting to prototype a bitmap compression approach and see how much it complicates the code and how it impacts performance. the added complexity vs. the potential gain of this approach may not make that much sense for my particular use-case, but it may for the community as a whole.
4.3) does it violate a principle in the code (Tom's latest email)
from my perspective, the code has to deal with the fact that natts on the persistent row is different than the tupdesc already, this is the foundation upon which ALTER TABLE ADD NULL and DROP COLUMN as a metadata operation are built. so i believe that this patch strengthens the code's ability to handle the ALTER TABLE ADD NULL case by generalizing it: now the code will more frequently need to deal with the disparity between natts and tupdesc rather than only after someone did an ALTER TABLE. i'm an advocate of making corner cases go through generalized code where possible.
however, in all honestly i hadn't consider Tom's point that the patch has created a situation where natts on a row can deviate from the tupdesc that it was built with (as opposed to the current tupdesc which it could always deviate due to a subsequent ALTER TABLE). this is a subtle point and i don't think i have enough experience in the Postgres code to argue one way or another about it. so i'll have to leave that determination up to people that are more familiar with the code.
thanks for reading all of this.
here are the performance numbers generated by my colleague against the current version of this patch (which is attached). to me this demonstrates that pgbench is not degraded, which was the reason the patch was backed out.
20130121 | 20130121 |
vanilla | Modified patch |
5213.30 | 5168.70 |
5229.80 | 5158.28 |
5183.47 | 5183.01 |
5140.82 | 5217.11 |
5180.93 | 5194.44 |
5169.55 | 5202.02 |
5199.78 | 5197.82 |
5228.76 | 5175.75 |
5187.35 | 5233.96 |
5118.66 | 5221.44 |
5176.20 | 5148.09 |
5215.89 | 5192.26 |
5221.13 | 5126.39 |
5232.44 | 5164.58 |
5188.14 | 5210.91 |
5176.77 | 5200.41 |
5218.18 | 5189.85 |
5207.82 | 5158.01 |
5203.77 | 5144.67 |
5213.74 | 5171.70 |
5154.11 | 5171.24 |
5165.22 | 5174.00 |
5196.96 | 5136.03 |
5166.75 | 5176.05 |
5116.69 | 5188.85 |
5170.82 | 5197.51 |
5124.63 | 5145.85 |
5162.05 | 5190.66 |
5198.82 | 5187.08 |
5155.55 | 5199.11 |
5166.95 | 5195.08 |
5197.23 | 5170.88 |
5145.07 | 5152.88 |
5178.24 | 5170.48 |
5128.73 | 5228.55 |
5201.38 | 5189.90 |
5161.96 | 5163.39 |
5191.68 | 5164.13 |
5193.44 | 5172.01 |
5150.87 | 5140.21 |
5118.95 | 5163.93 |
5184.59 | 5145.37 |
5135.52 | 5183.75 |
5197.49 | 5173.54 |
5186.67 | 5207.20 |
5176.33 | 5183.88 |
5185.09 | 5210.38 |
5124.34 | 5182.11 |
avg | 5178.50 | 5180.27 |
stdev | 31.51 | 24.53 |
-------+----------+----------------| Here is the output of pgbench: transaction type: TPC-B (sort of)
scaling factor: 75
query mode: simple
number of clients: 8
number of threads: 8
duration: 300 s I'm running with the database in /run/shm, and the following changes to
the standard postgresql.conf: shared_buffers = 1024MB
checkpoint_segments = 255
checkpoint_timeout = 15min
From: Tom Lane <firstname.lastname@example.org
To: Noah Misch <email@example.com
Cc: Simon Riggs <firstname.lastname@example.org
>; Amit Kapila <email@example.com
>; Kevin Grittner <firstname.lastname@example.org
>; email@example.com; firstname.lastname@example.org
; email@example.com; firstname.lastname@example.org
Sent: Wednesday, January 23, 2013 8:13 PM
Subject: Re: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Noah Misch <email@example.com
On Wed, Jan 09, 2013 at 11:22:06AM +0000, Simon Riggs wrote:
On balance, it would seem optimizing for this special case would
affect everybody negatively; not much, but enough. Which means we
should rekect this patch.
I've failed to come up with a non-arbitrary reason to recommend for or against
this patch, so I profess neutrality on the ultimate issue.
I think if we can't convincingly show an attractive performance gain,
we should reject the patch on the grounds of added complexity. Now,
the amount of added code surely isn't much, but nonetheless this patch
eliminates an invariant that's always been there (ie, that a tuple's
natts matches the tupdesc it was built with). That will at the very
least complicate forensic work, and it might well break third-party
code, or corners of the core system that we haven't noticed yet.
I'd be okay with this patch if it had more impressive performance
results, but as it is I think we're better off without it.
regards, tom lane