I can't help thinking that (a) this is an incredibly narrow use-case,
and (b) you'd be well advised to rethink your schema design anyway.
It's more common than you'd think. Both EAV and Hstore have their own
(severe) drawbacks.

For example, I'm working on an application which collects telemetry data
from hardware. This can involve up to 700 columns of data, most of
which is booleans, and an awful lot of which is NULL.

Also, adding lots of columns *is* following "proper" relational design
like we urge users to do, so it would be nice to make it perfomant.

Now, the other issue I'd be worried about for this optimization is what
happens when the nulls become non-trailing? For example, this pattern:

1. Out of 700 columns, columns 301+ are all Null, so we map them away.
2. User updates column 688 to non-null
3. Suddenly we have a MUCH larger row which will no longer fit on the page.

If your application had a lot of that kind of update pattern, I'd be
concerned that this would be a deoptimzation.
If there is a exact standard as to how this group does performance
analysis (e.g. removing outliers beyond a certain standard deviation,
number of repetitions, machine isolation requirements and so forth),
please let me know.
Oh, don't I wish! We're a lot more "cowboy" that that. Greg Smith and
Mark Wong have been trying to build a performance testing
infrastructure, but right now our test software and methodology is
*very* primitive. You're welcome to help and suggest.
I can submit my results as is but in the interest
of avoiding a lot of duplicate postings perhaps someone can point me
to an example of what kinds of numbers are desired so I can make sure
my posting conforms to that. For what it is worth I ran the 3 tests
10 times each and removed the outliers, but I can run 100 times or do
something different if need be (e.g. post a csv for easy consumption
in a spreadsheet).
Actually, I think just doing a battery of pgbench tests, for both the
bigger and smaller than memory cases, with the patch installed, would
give us some results for the non-NULL case. Something more
sophisticated like DVDstore or DBT2 would be even better, since the
tables there have more columns.
I tried 3 variations:
1) target has all nullable columns, all set to non null values: the
results were the same
2) target has all nullable columns, only the first column is set:
the patch was slightly faster
3) target has all non-null columns: the patch maybe was slightly
faster, probably not statistically relevant

This seems pretty on-target; can you share the numbers, the nature of
the test, and the setup with us so that we can evaulate it?
Also, Simon, you mentioned posting "environment
notes", can you let me know what kind of environment notes are
desired? For example, are you thinking about changes to the vanilla
postgresql.conf, hardware information, OS config, etc?
Yes, exactly.

Josh Berkus
PostgreSQL Experts Inc.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 12 of 28 | next ›
Discussion Overview
grouppgsql-hackers @
postedApr 17, '12 at 4:23p
activeAug 10, '12 at 12:06a



site design / logo © 2021 Grokbase