On 4/5/13 6:39 PM, Jeff Davis wrote:
On Fri, 2013-04-05 at 10:34 +0200, Florian Pflug wrote:
Maybe we could scan forward to check whether a corrupted WAL record is
followed by one or more valid ones with sensible LSNs. If it is,
chances are high that we haven't actually hit the end of the WAL. In
that case, we could either log a warning, or (better, probably) abort
crash recovery. +1.
Corruption of fields which we require to scan past the record would
cause false negatives, i.e. no trigger an error even though we do
abort recovery mid-way through. There's a risk of false positives too,
but they require quite specific orderings of writes and thus seem
rather unlikely. (AFAICS, the OS would have to write some parts of
record N followed by the whole of record N+1 and then crash to cause a
false positive).
Does the xlp_pageaddr help solve this?

Also, we'd need to be a little careful when written-but-not-flushed WAL
data makes it to disk, which could cause a false positive and may be a
fairly common case.
Apologies if this is a stupid question, but is this mostly an issue due to torn pages? IOW, if we had a way to ensure we never see torn pages, would that mean an invalid CRC on a WAL page indicated there really was corruption on that page?

Maybe it's worth putting (yet more) thought into the torn page issue... :/
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

Search Discussions

Discussion Posts


Follow ups

Related Discussions



site design / logo © 2021 Grokbase