FAQ
I'm trying to resolve an I/O problem on a CentOS 5.6 server. The
process basically scans through Maildirs, checking for space usage and
quota. Because there are hundred odd user folders and several 10s of
thousands of small files, this sends the I/O wait % way high. The
server hits a very high load level and stops responding to other
requests until the crawl is done.

I am wondering if I add another disk and symlink the sub-directories
to that, would that free up the server to respond to other requests
despite the wait on that disk?

Alternatively, if I mdraid mirror the existing disk, would md be smart
enough to read using the other disk while the first's tied up with the
first process?

Search Discussions

  • Benjamin Franz at Jun 9, 2011 at 6:38 am

    On 06/09/2011 02:24 AM, Emmanuel Noobadmin wrote:
    I'm trying to resolve an I/O problem on a CentOS 5.6 server. The
    process basically scans through Maildirs, checking for space usage and
    quota. Because there are hundred odd user folders and several 10s of
    thousands of small files, this sends the I/O wait % way high. The
    server hits a very high load level and stops responding to other
    requests until the crawl is done.

    I am wondering if I add another disk and symlink the sub-directories
    to that, would that free up the server to respond to other requests
    despite the wait on that disk?

    Alternatively, if I mdraid mirror the existing disk, would md be smart
    enough to read using the other disk while the first's tied up with the
    first process?
    You should look at running your process using 'ionice -c3 program'. That
    way it won't starve everything else for I/O cycles. Also, you may want
    to experiment with using the 'deadline' elevator instead of the default
    'cfq' (see http://www.redhat.com/magazine/008jun05/features/schedulers/
    and http://www.wlug.org.nz/LinuxIoScheduler). Neither of those would
    require you to change your hardware out. Also, setting 'noatime' for the
    mount options for partition holding the files will reduce the number of
    required I/Os quite a lot.

    But yes, in general, distributing your load across more disks should
    improve your I/O profile.

    --
    Benjamin Franz
  • Emmanuel Noobadmin at Jun 9, 2011 at 12:15 pm

    On 6/9/11, Mathias Bur?n wrote:
    The first thing that comes to my mind: Have you tried another IO scheduler?
    and the first thing that came to this noob's mind was: Wait, you mean
    there's actually more than one? AND I get to choose?

    I'll probably be experimenting with deadline and anticipatory since
    the i/o wait seems to be due to the disk running back and fro trying
    to serve the file scan as well as legit read request so having that
    small wait for reads in the same area sounds like it would help.
  • Emmanuel Noobadmin at Jun 9, 2011 at 12:48 pm

    On 6/9/11, Benjamin Franz wrote:

    You should look at running your process using 'ionice -c3 program'. That
    way it won't starve everything else for I/O cycles. Also, you may want
    to experiment with using the 'deadline' elevator instead of the default
    'cfq' (see http://www.redhat.com/magazine/008jun05/features/schedulers/
    and http://www.wlug.org.nz/LinuxIoScheduler). Neither of those would
    require you to change your hardware out. Also, setting 'noatime' for the
    mount options for partition holding the files will reduce the number of
    required I/Os quite a lot.
    Thanks for pointing out noatime, I came across in my reading
    previously but it never sunk in. This experience is definitely going
    to make sure of that :)

    Tthe crawl process is started by another program. crond starts the
    program, the program starts the email crawl or take other more crucial
    action depending on situation so I'm unsure if I should run it with
    ionice since it could potentially cause the more crucial action to
    lag/slow down.

    But I'll give it a try anyway over the weekend when any negative
    effect has lesser consequences and see if it affects other things.
    But yes, in general, distributing your load across more disks should
    improve your I/O profile.
    I'm going with noatime and ionice first to see the impact before I
    start playing around with the i/o scheduler. If all else fails, then
    I'll see about requesting for the extra hard disk.
  • Markus Falb at Jun 9, 2011 at 12:59 pm

    On 9.6.2011 12:38, Benjamin Franz wrote:
    On 06/09/2011 02:24 AM, Emmanuel Noobadmin wrote:
    I'm trying to resolve an I/O problem on a CentOS 5.6 server. The
    process basically scans through Maildirs, checking for space usage and
    quota. Because there are hundred odd user folders and several 10s of
    thousands of small files, this sends the I/O wait % way high. The
    server hits a very high load level and stops responding to other
    requests until the crawl is done.
    setting 'noatime' for the
    mount options for partition holding the files will reduce the number of
    required I/Os quite a lot.
    Yes, but before doing this be sure that your Software does not need atime.

    --
    Kind Regards, Markus Falb

    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: signature.asc
    Type: application/pgp-signature
    Size: 307 bytes
    Desc: OpenPGP digital signature
    Url : http://lists.centos.org/pipermail/centos/attachments/20110609/44c45d6b/attachment.bin
  • Rudi Ahlers at Jun 9, 2011 at 1:04 pm

    On Thu, Jun 9, 2011 at 12:38 PM, Benjamin Franz wrote:
    On 06/09/2011 02:24 AM, Emmanuel Noobadmin wrote:
    I'm trying to resolve an I/O problem on a CentOS 5.6 server. The
    process basically scans through Maildirs, checking for space usage and
    quota. Because there are hundred odd user folders and several 10s of
    thousands of small files, this sends the I/O wait % way high. The
    server hits a very high load level and stops responding to other
    requests until the crawl is done.

    I am wondering if I add another disk and symlink the sub-directories
    to that, would that free up the server to respond to other requests
    despite the wait on that disk?

    Alternatively, if I mdraid mirror the existing disk, would md be smart
    enough to read using the other disk while the first's tied up with the
    first process?
    You should look at running your process using 'ionice -c3 program'. That
    way it won't starve everything else for I/O cycles. Also, you may want
    to experiment with using the 'deadline' elevator instead of the default
    'cfq' (see http://www.redhat.com/magazine/008jun05/features/schedulers/
    and http://www.wlug.org.nz/LinuxIoScheduler). Neither of those would
    require you to change your hardware out. Also, setting 'noatime' for the
    mount options for partition holding the files will reduce the number of
    required I/Os quite a lot.

    But yes, in general, distributing your load across more disks should
    improve your I/O profile.

    --
    Benjamin Franz
    _______________________________________________

    Can one mount the root filesystem with noatime?


    --
    Kind Regards
    Rudi Ahlers
    SoftDux

    Website: http://www.SoftDux.com
    Technical Blog: http://Blog.SoftDux.com
    Office: 087 805 9573
    Cell: 082 554 7532
  • Emmanuel Noobadmin at Jun 9, 2011 at 1:09 pm

    On 6/10/11, Markus Falb wrote:
    Yes, but before doing this be sure that your Software does not need atime.
    For a brief moment, I had that sinking "Oh No... why didn't I see this
    earlier" feeling especially since I've already remounted the
    filesystem with noatime.

    Fortunately, so far it seems that everything's still alive and
    working, keeping fingers crossed :D
  • John R Pierce at Jun 9, 2011 at 1:26 pm

    On 06/09/11 2:24 AM, Emmanuel Noobadmin wrote:
    Alternatively, if I mdraid mirror the existing disk, would md be smart
    enough to read using the other disk while the first's tied up with the
    first process?
    that woudl be my first choice, and yes, queued read IO could be
    satisfied by either mirror, hence they'd have double the read performance.

    next step would be a raid 1+0 with yet more disks.

    --
    john r pierce N 37, W 122
    santa cruz ca mid-left coast
  • Les Mikesell at Jun 9, 2011 at 1:30 pm

    On 6/9/2011 12:09 PM, Emmanuel Noobadmin wrote:
    On 6/10/11, Markus Falbwrote:
    Yes, but before doing this be sure that your Software does not need atime.
    For a brief moment, I had that sinking "Oh No... why didn't I see this
    earlier" feeling especially since I've already remounted the
    filesystem with noatime.

    Fortunately, so far it seems that everything's still alive and
    working, keeping fingers crossed :D
    Some email software might use it to see if something has been updated
    since being read.

    --
    Les Mikesell
    lesmikesell at gmail.com
  • Devin Reade at Jun 9, 2011 at 2:28 pm
    --On Thursday, June 09, 2011 07:04:24 PM +0200 Rudi Ahlers
    wrote:
    Can one mount the root filesystem with noatime?
    Generally speaking, one can mount any of the filesystems with noatime.
    Whether or not this is a good thing depends on your use. As was
    previously mentioned, some software (but not a lot) depends on it.
    The only thing that comes to mind offhand is mail software that
    uses a single-file monolithic mailbox. (Cyrus IMAPd, for example,
    uses one file per message, so noatime doesn't affect its behavior).

    With noatime, you also (obviously) lose the ability to look at
    access times. *Once*, in my career, that was useful for doing
    forensics on a cracked system.

    OTOH, it can make a good performance improvement. On SSDs, it's
    can also help extend the drive's life.

    Devin
  • Thomas Harold at Jun 9, 2011 at 2:47 pm

    On 6/9/2011 1:09 PM, Emmanuel Noobadmin wrote:
    On 6/10/11, Markus Falbwrote:
    Yes, but before doing this be sure that your Software does not need atime.
    For a brief moment, I had that sinking "Oh No... why didn't I see this
    earlier" feeling especially since I've already remounted the
    filesystem with noatime.

    Fortunately, so far it seems that everything's still alive and
    working, keeping fingers crossed :D
    The last access time is generally not needed, especially for Maildirs.
    On our setup, Postfix and Dovecot don't care. I always mount as many
    file systems as possible with 'noatime'.

    (Our IMAP Maildir storage is a 4-disk RAID 1+0 array with a few million
    individual messages across a lot of accounts.)
  • Thomas Harold at Jun 9, 2011 at 2:51 pm

    On 6/9/2011 1:26 PM, John R Pierce wrote:
    On 06/09/11 2:24 AM, Emmanuel Noobadmin wrote:
    Alternatively, if I mdraid mirror the existing disk, would md be smart
    enough to read using the other disk while the first's tied up with the
    first process?
    that woudl be my first choice, and yes, queued read IO could be
    satisfied by either mirror, hence they'd have double the read performance.

    next step would be a raid 1+0 with yet more disks.
    mdadm is good, but you'll never get double the read performance. Even
    on our 3-way mirrors (RAID 1, 3 active disks), we don't come close to
    twice the performance gain.

    RAID 1+0 with 4/6/8 spindles is the best way to ensure that you get
    better performance.

    Adding RAM to the server so that you have a larger read buffer might
    also be an option.
  • Devin Reade at Jun 9, 2011 at 3:04 pm

    --On Thursday, June 09, 2011 12:28:28 PM -0600 Devin Reade wrote:

    The only thing that comes to mind offhand is mail software that
    uses a single-file monolithic mailbox.
    Another message reminded me that most such software is probably
    basing its checks off of the mtime anyway.

    Devin
  • Steve Thompson at Jun 9, 2011 at 3:06 pm

    On Thu, 9 Jun 2011, Emmanuel Noobadmin wrote:

    I'm trying to resolve an I/O problem on a CentOS 5.6 server. The
    process basically scans through Maildirs, checking for space usage and
    quota. Because there are hundred odd user folders and several 10s of
    thousands of small files, this sends the I/O wait % way high. The
    server hits a very high load level and stops responding to other
    requests until the crawl is done.
    If the server is reduced to a crawl, it's possible that you are hitting
    the dirty_ratio limit due to writes and the server has entered synchronous
    I/O mode. As others have mentioned, setting noatime could have a
    significant effect, especially if there are many files and the server
    doesn't have much memory. You can try increasing dirty_ratio to see if it
    has an effect, eg:

    # sysctl vm.dirty_ratio
    # sysctl -w vm.dirty_ratioP

    Steve
  • Steven Tardy at Jun 9, 2011 at 4:00 pm

    On 06/09/11 11:48, Emmanuel Noobadmin wrote:
    I'm going with noatime and ionice first
    did you set noatime on the host filesystem and/or the VM filesystem?

    i would think noatime on the VM would provide more benefit than on the host...
    shrug. now my brain hurts. gee thanks. (:

    --
    Steven Tardy
    Systems Analyst
    Information Technology Infrastructure
    Information Technology Services
    Mississippi State University
    sjt5 at its.msstate.edu
  • Emmanuel Noobadmin at Jun 9, 2011 at 11:21 pm

    On 6/10/11, Steven Tardy wrote:
    did you set noatime on the host filesystem and/or the VM filesystem?
    i would think noatime on the VM would provide more benefit than on the
    host...
    shrug. now my brain hurts. gee thanks. (:
    I was trying it on the host first, thinking that would cut down on
    half the writes since the host wouldn't have to update the atime
    everytime the diskfiles are accessed.

    But now that you brought it up, I'm wondering if that would had been
    pointless. If the kernel considers KVM opening the diskfile and
    holding onto it as a single access, regardless of how many subsequent
    reads/writes there are, then this wouldn't make a difference would it?
  • Gordon Messmer at Jun 10, 2011 at 3:09 am

    On 06/09/2011 08:21 PM, Emmanuel Noobadmin wrote:
    But now that you brought it up, I'm wondering if that would had been
    pointless. If the kernel considers KVM opening the diskfile and
    holding onto it as a single access, regardless of how many subsequent
    reads/writes there are, then this wouldn't make a difference would it?
    atime and mtime are updated for *every* read and write operation, not
    for the open() of the file.

    That aside, if you're running KVM I strongly recommend using LVM rather
    than file-backed VM guests. It's more work to set up, but you'll see
    drastically better IO performance in the guests. One system that I
    measured had a write speed of around 8 MB/s for sequential block output
    on file-backed VMs. LVM backed VMs wrote at around 56 MB/s for
    sequential block output.

    You should *never* used file-backed VMs for production systems.
  • Emmanuel Noobadmin at Jun 10, 2011 at 11:17 pm

    On 6/10/11, Gordon Messmer wrote:
    atime and mtime are updated for *every* read and write operation, not
    for the open() of the file.
    Ok. In any case, the combination of atime and ionice on the cronjob
    seems to have helped, no locked up in the past 24 hours. But it is a
    Saturday here so that might just be due to light usage, keeping
    fingers crossed.
    That aside, if you're running KVM I strongly recommend using LVM rather
    than file-backed VM guests. It's more work to set up, but you'll see
    drastically better IO performance in the guests. One system that I
    measured had a write speed of around 8 MB/s for sequential block output
    on file-backed VMs. LVM backed VMs wrote at around 56 MB/s for
    sequential block output.

    You should *never* used file-backed VMs for production systems.
    The irony of it was that I decided to go with qcow2 because I thought
    that would save overheads from an additional LVM layer but provided
    snapshot capabilities too :(

    Since I don't have enough spare space left on this particular system,
    I'll probably have to get them to agree to add an extra disk to do the
    LVM volumes, then figure out how to migrate the VM over from file to
    raw/partition.
  • Gordon Messmer at Jun 15, 2011 at 6:39 pm

    On 06/10/2011 08:17 PM, Emmanuel Noobadmin wrote:
    The irony of it was that I decided to go with qcow2 because I thought
    that would save overheads from an additional LVM layer but provided
    snapshot capabilities too :(
    I read somewhere recently that people were complaining abut LVM overhead
    and poor performance, but I've never seen any evidence of it. Was there
    something that made you think that LVM had significant overhead?
  • Emmanuel Noobadmin at Jun 15, 2011 at 10:04 pm

    On 6/16/11, Gordon Messmer wrote:

    I read somewhere recently that people were complaining abut LVM overhead
    and poor performance, but I've never seen any evidence of it. Was there
    something that made you think that LVM had significant overhead?
    Looking at some very sparse notes I made on the decision, I think what
    tipped the choice was that both qcow2 and lvm added overheads, but lvm
    was on the whole system i.e. the host has additional processing on
    every i/o whereas qcow2 overheads was only for guest i/o. More
    critically my note was the thought as well that it would be easier to
    move a qcow2 file to another machine/disk if necessary than to move a
    partition.
  • Gordon Messmer at Jun 16, 2011 at 3:08 am

    On 06/15/2011 07:04 PM, Emmanuel Noobadmin wrote:
    Looking at some very sparse notes I made on the decision, I think what
    tipped the choice was that both qcow2 and lvm added overheads, but lvm
    was on the whole system i.e. the host has additional processing on
    every i/o whereas qcow2 overheads was only for guest i/o.
    I think you were misinformed, or misled. LVM should not present any
    noticeable overhead on the host. Using "raw" files to back VMs presents
    a significant overhead to guests; the host performs all IO through its
    filesystem. Using "qcow2" files presents even more overhead (probably
    the most of any configuration) since there are complexities to the qcow2
    file itself in addition to the host's filesystem.
    More
    critically my note was the thought as well that it would be easier to
    move a qcow2 file to another machine/disk if necessary than to move a
    partition.
    It shouldn't be significantly harder to copy the contents of a partition
    or LV. The block device is a file. You can read its contents to copy
    them just as easily as any other file.
  • Emmanuel Noobadmin at Jun 16, 2011 at 4:01 am

    On 6/16/11, Gordon Messmer wrote:
    I think you were misinformed, or misled.
    That wouldn't be new for me as far as system administration is concerned :D
    LVM should not present any
    noticeable overhead on the host. Using "raw" files to back VMs presents
    a significant overhead to guests; the host performs all IO through its
    filesystem. Using "qcow2" files presents even more overhead (probably
    the most of any configuration) since there are complexities to the qcow2
    file itself in addition to the host's filesystem.
    I was concerned about qcow2 vs raw as well since it seemed logical
    that qcow2 would be slower for the added functionality. However there
    was some site I found that showed that KVM with virtio, turning off
    host caching (or specifying write-back instead of the default
    write-through) on the file and doing preallocation on qcow2 files will
    make qcow2 as fast as raw.
    It shouldn't be significantly harder to copy the contents of a partition
    or LV. The block device is a file. You can read its contents to copy
    them just as easily as any other file.
    Although the combination of ionice and atime seemed to have stopped
    things from going through the roof, I'll probably still try to convert
    one of them to LVM and see if that improves things even further.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcentos @
categoriescentos
postedJun 9, '11 at 5:24a
activeJun 16, '11 at 4:01a
posts22
users11
websitecentos.org
irc#centos

People

Translate

site design / logo © 2022 Grokbase