FAQ
Recently, when I was running my application on 8.3.7, my data got
corrupted. The scene was like this: "invalid memory alloc request size ...."

I invested the error data, and found that one sector of a db-block became
all-zero (I confirmed the reason later, it was because that my disk got
bad).

I also checked the log of postmaster, and I found that there were 453
ERROR messages that said "could not read block XXX of relation XXX: ??",
where XXX was the db-block that the bad sector resided in. After these 453
failed read operations, postmaster read successed, but got an all-zero
sector! (I don't know why operating system will allow this happen, but it
just happened)

My question is: should not mdxxx functions(e.g. mdread, mdwrite, mdsync)
just report PANIC instead of ERROR when I/O failed? IMO, since the data has
already corrupted, reporting ERROR will just leave us a very curious scene
later -- which does more harm that benefit.

Search Discussions

  • Martijn van Oosterhout at Jun 15, 2009 at 11:27 am

    On Mon, Jun 15, 2009 at 04:41:42PM +0800, Jacky Leng wrote:
    My question is: should not mdxxx functions(e.g. mdread, mdwrite, mdsync)
    just report PANIC instead of ERROR when I/O failed? IMO, since the data has
    already corrupted, reporting ERROR will just leave us a very curious scene
    later -- which does more harm that benefit.
    I think the reasoning is that if those functions reported a PANIC the
    chance you could recover your data is zero, because you need the
    database system to read the other (good) data.

    With an ERROR you can investigate the problem and save what can de
    saved...

    Have a nice day,
    --
    Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
    Please line up in a tree and maintain the heap invariant while
    boarding. Thank you for flying nlogn airlines.
  • Tom Lane at Jun 15, 2009 at 2:09 pm

    Martijn van Oosterhout writes:
    On Mon, Jun 15, 2009 at 04:41:42PM +0800, Jacky Leng wrote:
    My question is: should not mdxxx functions(e.g. mdread, mdwrite, mdsync)
    just report PANIC instead of ERROR when I/O failed? IMO, since the data has
    already corrupted, reporting ERROR will just leave us a very curious scene
    later -- which does more harm that benefit.
    I think the reasoning is that if those functions reported a PANIC the
    chance you could recover your data is zero, because you need the
    database system to read the other (good) data.
    Also, in the case you're complaining about, the problem was that there
    wasn't any O/S error report that we could have PANIC'd about anyhow.

    But Martijn is correct that a PANIC here would reduce the system's
    overall stability without any clear benefit. We already do refuse
    to read a page into shared buffers if there's a read error on it,
    so it's not clear to me how you think that an ERROR leaves things
    in an unstable state.

    regards, tom lane
  • Jacky Leng at Jun 16, 2009 at 2:13 am

    I think the reasoning is that if those functions reported a PANIC the
    chance you could recover your data is zero, because you need the
    database system to read the other (good) data.
    I do not see why PANIC reduced the chance to recover my data. AFAICS,
    my data has already corrupted(because of the bad-block here), whether
    PANIC or not, the read opertion on the bad-block should get the same result.

    Also, in the case you're complaining about, the problem was that there
    wasn't any O/S error report that we could have PANIC'd about anyhow.
    No, the O/S did report the error, which lead to the 453 ERROR messages of
    postgres. The O/S error messages(got this using dmesg) is like this:
    end_request: I/O error, dev sda, sector 504342711
    ata1: EH complete
    SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
    sda: Write Protect is off
    sda: Mode Sense: 00 3a 00 00
    SCSI device sda: drive cache: write back
    ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
    ata1.00: (irq_stat 0x40000008)
    ata1.00: cmd 60/08:00:b0:a8:0f/00:00:1e:00:00/40 tag 0 cdb 0x0 data 4096
    in
    res 41/40:08:b7:a8:0f/06:00:1e:00:00/00 Emask 0x9 (media error)
    ata1.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168
    ata1.00: ata_hpa_resize 1: sectors = 976773168, hpa_sectors = 976773168

    We already do refuse
    to read a page into shared buffers if there's a read error on it,
    so it's not clear to me how you think that an ERROR leaves things
    in an unstable state.
    In my scene, it seems that the O/S does not ensure that if an I/O operation
    (read, write, sync, etc) on a block failed, then all later I/O operations
    on this block will also failed. For example:
    1. As I noted before, although the bad db-block in my data has been read
    unsuccessfully for 453 times, but the 454th read operation succeeds(but
    some data(the bad sector) has been set to all-zero). So, even if the 453
    failed I/O has reported ERROR, there is still chance that the bad
    db-block
    can be read in shared buffres.
    2. Besides, I have noticed a scene like this: 1)an mdsync operations failed
    with the message "ERROR: could not fsync segment XXX of relation XXX:
    ??";

    The error message of O/S(I get this using dmesg command) is like this:
    Buffer I/O error on device ^A&#63733;XX205503, logical block 43837786
    lost page write due to I/O error on ^A&#63733;XX205503

    2) This leaves a half-writen db-block in my data. But the page can still
    be read in shared buffers successfully later, which leads to an curious
    scene that says "ERROR: could not access status of transaction XXXXX"

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJun 15, '09 at 8:43a
activeJun 16, '09 at 2:13a
posts4
users3
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase