On 6/17/13 3:38 PM, Josh Berkus wrote:
Why? Why can't we just update the affected pages in the index?

The page range has to be scanned in order to find out the min/max values
for the indexed columns on the range; and then, with these data, update
the index.
Seems like you could incrementally update the range, at least for
inserts. If you insert a row which doesn't decrease the min or increase
the max, you can ignore it, and if it does increase/decrease, you can
change the min/max. No?

For updates, things are more complicated. If the row you're updating
was the min/max, in theory you should update it to adjust that, but you
can't verify that it was the ONLY min/max row without doing a full scan.
My suggestion would be to add a "dirty" flag which would indicate that
that block could use a rescan next VACUUM, and otherwise ignore changing
the min/max. After all, the only defect to having min to low or max too
high for a block would be scanning too many blocks. Which you'd do
anyway with it marked "invalid".
If we add a dirty flag it would probably be wise to allow for more than one value so we can do a clock-sweep. That would allow for detecting a range that is getting dirtied repeatedly and not bother to try and re-summarize it until later.

Something else I don't think was mentioned... re-summarization should be somehow tied to access activity: if a query will need to seqscan a segment that needs to be summarized, we should take that opportunity to summarize at the same time while pages are in cache. Maybe that can be done in the backend itself; maybe we'd want a separate process.
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 22 of 27 | next ›
Discussion Overview
grouppgsql-hackers @
postedJun 14, '13 at 10:28p
activeJul 19, '13 at 4:43p



site design / logo © 2021 Grokbase