FAQ
I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment
using iSCSI storage. Recently we've begun to experience journal aborts
resulting in remounted-read-only filesystems as well as other filesystem
issues - I can unmount a filesystem and force a check with "fsck -f" and
occasionally find errors.

I've found -
<https://bugzilla.redhat.com/show_bug.cgi?id"8108>
<http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalIdQ306>
- which seem related but I believe I am running a kernel that contains
these fixes.

My kernel is 2.6.18-194.32.1.el5 on one of the most effected hosts.

Does anyone else have experience with similar issues or know of the
status of this Bug/KB?

I can install, boot, run, and then at some seemingly random moment -

init_special_inode: bogus i_mode (50632)
init_special_inode: bogus i_mode (137147)
init_special_inode: bogus i_mode (172036)
init_special_inode: bogus i_mode (175720)
init_special_inode: bogus i_mode (72350)
init_special_inode: bogus i_mode (174751)
EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698169 in dir
#19696695
Aborting journal on device sdb2.
init_special_inode: bogus i_mode (165661)
EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698131 in dir
#19696695
init_special_inode: bogus i_mode (76763)
init_special_inode: bogus i_mode (3116)
init_special_inode: bogus i_mode (75363)
init_special_inode: bogus i_mode (77034)
init_special_inode: bogus i_mode (132237)
EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698139 in dir
#19696695
init_special_inode: bogus i_mode (53031)
init_special_inode: bogus i_mode (33361)
init_special_inode: bogus i_mode (77546)
init_special_inode: bogus i_mode (6516)
EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698143 in dir
#19696695
init_special_inode: bogus i_mode (6442)
init_special_inode: bogus i_mode (72554)
EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698142 in dir
#19696695
EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698164 in dir
#19696695
init_special_inode: bogus i_mode (73171)
init_special_inode: bogus i_mode (154432)
init_special_inode: bogus i_mode (34302)
init_special_inode: bogus i_mode (131733)
init_special_inode: bogus i_mode (30773)
ext3_abort called.
EXT3-fs error (device sdb2): ext3_journal_start_sb: Detected aborted
journal
Remounting filesystem read-only

Search Discussions

  • Kwan Lowe at Feb 13, 2011 at 9:29 am

    On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams wrote:
    I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment
    using iSCSI storage. ?Recently we've begun to experience journal aborts
    resulting in remounted-read-only filesystems as well as other filesystem
    issues - I can unmount a filesystem and force a check with "fsck -f" and
    occasionally find errors.

    I've found -
    <https://bugzilla.redhat.com/show_bug.cgi?id"8108>
    <http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalIdQ306>
    - which seem related but I believe I am running a kernel that contains
    these fixes.
    I ran into a similar problem, but it was not specifically iSCSI. We
    ended up setting a sysctl.conf file. Give me a few and I will find
    the setting..
  • Kwan Lowe at Feb 13, 2011 at 9:40 am

    On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams wrote:
    I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment
    using iSCSI storage. ?Recently we've begun to experience journal aborts
    resulting in remounted-read-only filesystems as well as other filesystem
    issues - I can unmount a filesystem and force a check with "fsck -f" and
    occasionally find errors.
    http://communities.vmware.com/message/245983

    The setting we used to resolve was vm.min_free_kbytes = 8192

    Previous to this we were seeing the error pop up every week or so.
  • Keith Beeby at Feb 13, 2011 at 3:28 pm
    Hi

    Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared storage, according the the VMware knowledge base article this should have been resolved in v5.1 update??.

    Does changing the vm.min_free_kbytes value apply CentOS v.5.4 and 5.5 as well to resolve the issue?
    On 13 Feb 2011, at 14:40, "Kwan Lowe" wrote:

    On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams
    wrote:
    I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment
    using iSCSI storage. Recently we've begun to experience journal aborts
    resulting in remounted-read-only filesystems as well as other filesystem
    issues - I can unmount a filesystem and force a check with "fsck -f" and
    occasionally find errors.
    http://communities.vmware.com/message/245983

    The setting we used to resolve was vm.min_free_kbytes = 8192

    Previous to this we were seeing the error pop up every week or so.
    _______________________________________________
    CentOS mailing list
    CentOS at centos.org
    http://lists.centos.org/mailman/listinfo/centos
  • Adam Tauno Williams at Feb 13, 2011 at 7:03 pm

    On Sun, 2011-02-13 at 20:28 +0000, Keith Beeby wrote:
    Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared
    storage, according the the VMware knowledge base article this should
    have been resolved in v5.1 update??.
    Does changing the vm.min_free_kbytes valu apply CentOS v.5.4 and 5.5
    as well to resolve the issue?
    I guess we'll see [this issue has become extremely frustrating].

    I suppose it is 'good' to see that someone else sees the issue as well.
    One issue with virtualization is that debugging these types of issues is
    an order-of-magnitude more difficult [virtualized OS, virtualized
    storage, virtualization platform, or some interaction of all the
    above... ugh].
    On 13 Feb 2011, at 14:40, "Kwan Lowe" wrote:
    On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams
    wrote:
    I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment
    using iSCSI storage. Recently we've begun to experience journal aborts
    resulting in remounted-read-only filesystems as well as other filesystem
    issues - I can unmount a filesystem and force a check with "fsck -f" and
    occasionally find errors.
    http://communities.vmware.com/message/245983
    The setting we used to resolve was vm.min_free_kbytes = 8192
    Previous to this we were seeing the error pop up every week or so.
  • Bazooka Joe at Feb 14, 2011 at 4:01 pm

    On Sun, Feb 13, 2011 at 4:03 PM, Adam Tauno Williams wrote:
    On Sun, 2011-02-13 at 20:28 +0000, Keith Beeby wrote:
    Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared
    storage, according the the VMware knowledge base article this should
    have been resolved in v5.1 update??.
    Does changing the vm.min_free_kbytes valu ?apply CentOS v.5.4 and 5.5
    as well to resolve the issue?
    I guess we'll see [this issue has become extremely frustrating].

    I suppose it is 'good' to see that someone else sees the issue as well.
    One issue with virtualization is that debugging these types of issues is
    an order-of-magnitude more difficult [virtualized OS, virtualized
    storage, virtualization platform, or some interaction of all the
    above... ugh].

    I am experiencing the same issue.

    cent: current
    exsi v3.5 update 5
    storage nfs

    I am in the process of rebuilding the virtual server using a different
    os thinking it was just file system errors.

    -bazooka
  • Adam Tauno Williams at Feb 14, 2011 at 5:02 pm

    On Mon, 2011-02-14 at 13:01 -0800, Bazooka Joe wrote:
    On Sun, Feb 13, 2011 at 4:03 PM, Adam Tauno Williams
    wrote:
    On Sun, 2011-02-13 at 20:28 +0000, Keith Beeby wrote:
    Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared
    storage, according the the VMware knowledge base article this should
    have been resolved in v5.1 update??.
    Does changing the vm.min_free_kbytes valu apply CentOS v.5.4 and 5.5
    as well to resolve the issue?
    I guess we'll see [this issue has become extremely frustrating].
    I suppose it is 'good' to see that someone else sees the issue as well.
    One issue with virtualization is that debugging these types of issues is
    an order-of-magnitude more difficult [virtualized OS, virtualized
    storage, virtualization platform, or some interaction of all the
    above... ugh].
    I am experiencing the same issue.
    cent: current
    exsi v3.5 update 5
    storage nfs
    I am in the process of rebuilding the virtual server using a different
    os thinking it was just file system errors.
    What other OS?

    I've experienced one [possibly unrelated] corruption of the /tmp
    filesystem on an openSUSE 11.1 VM. So far Windows VMs seem immune to
    the issue.
  • Adam Tauno Williams at Feb 13, 2011 at 7:00 pm

    On Sun, 2011-02-13 at 09:40 -0500, Kwan Lowe wrote:
    On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams
    wrote:
    I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment
    using iSCSI storage. Recently we've begun to experience journal aborts
    resulting in remounted-read-only filesystems as well as other filesystem
    issues - I can unmount a filesystem and force a check with "fsck -f" and
    occasionally find errors.
    http://communities.vmware.com/message/245983
    The setting we used to resolve was vm.min_free_kbytes = 8192
    Previous to this we were seeing the error pop up every week or so.
    You made this change to the *virtual machine* [not the host OS]?

    This thread indicates this was with VMware Workstation and not ESX
    (correct)?
  • Kwan Lowe at Feb 14, 2011 at 5:36 am

    On Sun, Feb 13, 2011 at 7:00 PM, Adam Tauno Williams wrote:
    em and force a check with "fsck -f" and
    occasionally find errors.
    http://communities.vmware.com/message/245983
    The setting we used to resolve was vm.min_free_kbytes = 8192
    Previous to this we were seeing the error pop up every week or so.
    You made this change to the *virtual machine* [not the host OS]?

    This thread indicates this was with VMware Workstation and not ESX
    (correct)?
    This was done on the CentOS and RHEL guests on VMWare ESX hosts.
  • Keith Beeby at Feb 14, 2011 at 7:08 am
    Hi,

    So the 'fix' is applied directly to the host os, is this the correct thing to do?

    sysctl -w vm.min_free_kbytes = 8192

    Keith



    On 14 Feb 2011, at 10:36, Kwan Lowe wrote:

    On Sun, Feb 13, 2011 at 7:00 PM, Adam Tauno Williams
    wrote:
    em and force a check with "fsck -f" and
    occasionally find errors.
    http://communities.vmware.com/message/245983
    The setting we used to resolve was vm.min_free_kbytes = 8192
    Previous to this we were seeing the error pop up every week or so.
    You made this change to the *virtual machine* [not the host OS]?

    This thread indicates this was with VMware Workstation and not ESX
    (correct)?
    This was done on the CentOS and RHEL guests on VMWare ESX hosts.
    _______________________________________________
    CentOS mailing list
    CentOS at centos.org
    http://lists.centos.org/mailman/listinfo/centos
  • Adam Tauno Williams at Feb 14, 2011 at 8:00 am

    On Mon, 2011-02-14 at 12:08 +0000, Keith Beeby wrote:
    Hi,
    So the 'fix' is applied directly to the host os,
    no, to the *guest* OS instances. [please, do not top-post].
    is this the correct thing to do?
    sysctl -w vm.min_free_kbytes = 8192
    No space(s) I believe.

    sysctl -w vm.min_free_kbytes�92

    I'm still not entirely clear as to why this setting should/will make a
    difference in maintaining filesystem integrity.

    On "Jun 20, 2007" in the aforementioned thread there is the comment:
    "RHEL5 still needs a "fix" as well, and since it's not yet officially
    supported from VMware for ESX my guess is it won't get a formal fix
    until it is certified. I plan to post a patched driver for RHEL5 on my
    website in the next day or so." - but the comment is from *2007* and
    RHEL5 is now certified.

    <http://communities.vmware.com/message/881727#881727> seems like an
    update that describes my issue; but even that is from 2008.

    Reference: VMware KB#1001778 (Note: RHEL5U1 is long since released)
    On 14 Feb 2011, at 10:36, Kwan Lowe wrote:
    On Sun, Feb 13, 2011 at 7:00 PM, Adam Tauno Williams
    wrote:
    em and force a check with "fsck -f" and
    occasionally find errors.
    http://communities.vmware.com/message/245983
    The setting we used to resolve was vm.min_free_kbytes = 8192
    Previous to this we were seeing the error pop up every week or so.
    You made this change to the *virtual machine* [not the host OS]?
    This thread indicates this was with VMware Workstation and not ESX
    (correct)?
    This was done on the CentOS and RHEL guests on VMWare ESX hosts.
  • Kwan Lowe at Feb 14, 2011 at 8:31 am

    On Mon, Feb 14, 2011 at 8:00 AM, Adam Tauno Williams wrote:
    On Mon, 2011-02-14 at 12:08 +0000, Keith Beeby wrote:
    Hi,
    So the 'fix' is applied directly to the host os,
    no, to the *guest* OS instances. ?[please, do not top-post].
    is this the correct thing to do?
    sysctl -w vm.min_free_kbytes = 8192
    No space(s) I believe.

    sysctl -w vm.min_free_kbytes�92

    I'm still not entirely clear as to why this setting should/will make a
    difference in maintaining filesystem integrity.
    It's certainly possible that the error I was receiving was a different
    reason, though similar symptoms. We started seeing filesystems go
    read-only, and only rebooting would clear it up.
  • Johnny Hughes at Feb 14, 2011 at 1:37 pm

    On 02/14/2011 07:31 AM, Kwan Lowe wrote:
    On Mon, Feb 14, 2011 at 8:00 AM, Adam Tauno Williams
    wrote:
    On Mon, 2011-02-14 at 12:08 +0000, Keith Beeby wrote:
    Hi,
    So the 'fix' is applied directly to the host os,
    no, to the *guest* OS instances. [please, do not top-post].
    is this the correct thing to do?
    sysctl -w vm.min_free_kbytes = 8192
    No space(s) I believe.

    sysctl -w vm.min_free_kbytes�92

    I'm still not entirely clear as to why this setting should/will make a
    difference in maintaining filesystem integrity.
    It's certainly possible that the error I was receiving was a different
    reason, though similar symptoms. We started seeing filesystems go
    read-only, and only rebooting would clear it up.
    I use that setting on the "Host OS" for VMWare to prevent a whole vm
    from getting killed.

    That setting will maintain a minimum amount of free memory available to
    prevent a large program that requests memory quick from depleting all
    available memory and causing the program killer from killing the highest
    RAM process.

    If you are on a Host OS box, the biggest Memory processes are your VMs,
    and getting one killed off because memory reaches zero is not good.

    I don't have any idea how it would fix journal errors on a drive, but I
    guess it could.

    I set it much higher than 8192 on the host machines ... I set it to 131072.

    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: signature.asc
    Type: application/pgp-signature
    Size: 253 bytes
    Desc: OpenPGP digital signature
    Url : http://lists.centos.org/pipermail/centos/attachments/20110214/a45181f3/attachment.bin
  • Kwan Lowe at Feb 14, 2011 at 1:49 pm

    It's certainly possible that the error I was receiving was a different
    reason, though similar symptoms. We started seeing filesystems go
    read-only, and only rebooting would clear it up.
    I use that setting on the "Host OS" for VMWare to prevent a whole vm
    from getting killed.

    That setting will maintain a minimum amount of free memory available to
    prevent a large program that requests memory quick from depleting all
    available memory and causing the program killer from killing the highest
    RAM process.

    If you are on a Host OS box, the biggest Memory processes are your VMs,
    and getting one killed off because memory reaches zero is not good.

    I don't have any idea how it would fix journal errors on a drive, but I
    guess it could.
    It's been a few years since I put in the tuning, but here's some info
    that might be useful:

    http://communities.vmware.com/thread/20690?start=0&tstart=0

    In particular, others had reported seeing this error:

    "kernel: journal_get_undo_access: No memory for committed data".

    I don't recall that error in my case, but might explain why the tuning
    fixed the problem. There's a bugzilla for this:

    https://bugzilla.redhat.com/show_bug.cgi?id9605

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcentos @
categoriescentos
postedFeb 13, '11 at 9:09a
activeFeb 14, '11 at 5:02p
posts14
users5
websitecentos.org
irc#centos

People

Translate

site design / logo © 2022 Grokbase