FAQ
My NFS setup is a heartbeat setup on two servers running Active/Passive
DRBD. The NFS servers themselves are 1x 2 core Opterons with 8G ram and
5TB space with 16 drives and a 3ware controller. They're connected to a
HP procurve switch with bonded ethernet. The sync-rates between the two
DRBD nodes seem to safely reach 200Mbps or better. The processors on the
active NFS servers run with a load of 0.2, so it seems mighty healthy.
Until I do a serious backup.

I have a few load balanced web nodes and two database nodes as NFS
clients. When I start backing up my database to a mounted NFS partition,
a plain rsync drives the NFS box through the roof and forces a failover.
I can do my backup using --bwlimit00, but then I'm not anywhere close
to a fast backup, just 1.5MBps. My backups are probably 40G. (The
database has fast disks and between database copies I see run at up to
60MBps - close to 500Mbps). I obviously do not have a networking issue.

The processor loads up like this:
bwlimit 1500 load 2.3
bwlimit 2500 load 3.5
bwlimit 4500 load 5.5+

The DRBD secondary seems to run at about 1/2 the load of the primary.

What I'm wondering is--why is this thing *so* load sensitive? Is it
DRBD? Is it NFS? I'm guessing that since I only have two cores in the
NFS boxes that a prolonged transfer makes NFS dominates 1 core and DRBD
dominate the next, and so I'm saturating my processor.

Thots?

Jed

Search Discussions

  • Ugo Bellavance at Jan 6, 2008 at 9:03 pm

    Jed Reynolds wrote:
    My NFS setup is a heartbeat setup on two servers running Active/Passive
    DRBD. The NFS servers themselves are 1x 2 core Opterons with 8G ram and
    5TB space with 16 drives and a 3ware controller. They're connected to a
    HP procurve switch with bonded ethernet. The sync-rates between the two
    DRBD nodes seem to safely reach 200Mbps or better. The processors on the
    active NFS servers run with a load of 0.2, so it seems mighty healthy.
    Until I do a serious backup.

    I have a few load balanced web nodes and two database nodes as NFS
    clients. When I start backing up my database to a mounted NFS partition,
    a plain rsync drives the NFS box through the roof and forces a failover.
    I can do my backup using --bwlimit00, but then I'm not anywhere close
    to a fast backup, just 1.5MBps. My backups are probably 40G. (The
    database has fast disks and between database copies I see run at up to
    60MBps - close to 500Mbps). I obviously do not have a networking issue.

    The processor loads up like this:
    bwlimit 1500 load 2.3
    bwlimit 2500 load 3.5
    bwlimit 4500 load 5.5+

    The DRBD secondary seems to run at about 1/2 the load of the primary.

    What I'm wondering is--why is this thing *so* load sensitive? Is it
    DRBD? Is it NFS? I'm guessing that since I only have two cores in the
    NFS boxes that a prolonged transfer makes NFS dominates 1 core and DRBD
    dominate the next, and so I'm saturating my processor.
    Is your CPU usage 100% all the time?


    Can you send us the output of vmstat -n 5 5
    when you're doing a backup?

    Regards,

    Ugo
  • Jed Reynolds at Jan 7, 2008 at 6:49 am

    Ugo Bellavance wrote:
    Jed Reynolds wrote:
    My NFS setup is a heartbeat setup on two servers running Active/Passive
    DRBD. The NFS servers themselves are 1x 2 core Opterons with 8G ram and
    5TB space with 16 drives and a 3ware controller. They're connected to a
    HP procurve switch with bonded ethernet. The sync-rates between the two
    DRBD nodes seem to safely reach 200Mbps or better. The processors on the
    active NFS servers run with a load of 0.2, so it seems mighty healthy.
    Until I do a serious backup.

    I have a few load balanced web nodes and two database nodes as NFS
    clients. When I start backing up my database to a mounted NFS partition,
    a plain rsync drives the NFS box through the roof and forces a failover.
    I can do my backup using --bwlimit00, but then I'm not anywhere close
    to a fast backup, just 1.5MBps. My backups are probably 40G. (The
    database has fast disks and between database copies I see run at up to
    60MBps - close to 500Mbps). I obviously do not have a networking issue.

    The processor loads up like this:
    bwlimit 1500 load 2.3
    bwlimit 2500 load 3.5
    bwlimit 4500 load 5.5+

    The DRBD secondary seems to run at about 1/2 the load of the primary.

    What I'm wondering is--why is this thing *so* load sensitive? Is it
    DRBD? Is it NFS? I'm guessing that since I only have two cores in the
    NFS boxes that a prolonged transfer makes NFS dominates 1 core and DRBD
    dominate the next, and so I'm saturating my processor.
    Is your CPU usage 100% all the time?
    Not 100% user or 100% system--not even close.
    Wow. Looks like a lot of idle wait time to me, actually.

    Looking at the stats below, I'd think that if there's so much idle time,
    it's either disk or network latency. I wonder if packets going thru the
    drbd device are ... wrong size? Drbd devices are waiting for a response
    from seconday? Seems strange.

    The only other thing running on that system is memcached, which uses 11%
    cpu. About 200 connections open to memcached from other hosts. There
    were 8 nfsd instances.
    Can you send us the output of vmstat -n 5 5
    when you're doing a backup?
    This is with rsync at bwlimit%00

    top - 22:37:23 up 3 days, 10:07, 4 users, load average: 4.67, 2.37, 1.30
    Tasks: 124 total, 1 running, 123 sleeping, 0 stopped, 0 zombie
    Cpu0 : 0.3% us, 1.3% sy, 0.0% ni, 9.3% id, 87.7% wa, 0.3% hi, 1.0% si
    Cpu1 : 0.0% us, 3.3% sy, 0.0% ni, 8.0% id, 83.7% wa, 1.7% hi, 3.3% si
    Mem: 8169712k total, 8148616k used, 21096k free, 296636k buffers
    Swap: 4194296k total, 160k used, 4194136k free, 6295284k cached

    $ vmstat -n 5 5
    procs -----------memory---------- ---swap-- -----io---- --system--
    ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy
    id wa
    0 10 160 24136 304208 6277104 0 0 95 38 22 63 0
    2 89 9
    0 10 160 28224 304228 6277288 0 0 36 64 2015 707 0
    3 0 97
    0 0 160 28648 304316 6280328 0 0 629 28 3332 1781 0
    4 65 31
    0 8 160 26784 304384 6283388 0 0 629 106 4302 3085 1
    5 70 25
    0 0 160 21520 304412 6287304 0 0 763 104 3487 1944 0
    4 78 18

    $ vmstat -n 5 5
    procs -----------memory---------- ---swap-- -----io---- --system--
    ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy
    id wa
    0 0 160 26528 301516 6287820 0 0 95 38 22 63 0
    2 89 9
    0 0 160 21288 301600 6292768 0 0 999 86 4856 3273 0
    2 87 11
    2 8 160 19408 298304 6283960 0 0 294 15293 33983 15309 0
    22 53 25
    0 10 160 28360 298176 6281232 0 0 34 266 2377 858 0
    2 0 97
    0 10 160 33680 298196 6281552 0 0 32 48 1937 564 0
    1 4 96
  • Jed Reynolds at Jan 7, 2008 at 7:12 am

    Jed Reynolds wrote:
    Ugo Bellavance wrote:
    Can you send us the output of vmstat -n 5 5
    when you're doing a backup?
    This is with rsync at bwlimit%00
    This is doing the same transfer with SSH. The load still climbs...and
    then load drops. I think NFS is the issue.

    I wonder if my NFS connection settings in client fstabs are unwise? I
    figured with beefy machine and fast networking, I could take advantage
    of large packetsizes. Bad packet sizes?

    rw,hard,intr,rsize384,wsize384


    top - 23:04:35 up 3 days, 10:34, 4 users, load average: 4.08, 3.06, 2.81
    Tasks: 132 total, 1 running, 131 sleeping, 0 stopped, 0 zombie
    Cpu0 : 5.7% us, 1.7% sy, 0.0% ni, 72.0% id, 19.3% wa, 0.7% hi, 0.7% si
    Cpu1 : 1.3% us, 3.0% sy, 0.0% ni, 38.4% id, 51.0% wa, 0.7% hi, 5.6% si
    Mem: 8169712k total, 8149288k used, 20424k free, 162628k buffers
    Swap: 4194296k total, 160k used, 4194136k free, 6374960k cached

    then

    top - 23:08:49 up 3 days, 10:39, 4 users, load average: 0.89, 1.86, 2.38
    Tasks: 129 total, 1 running, 128 sleeping, 0 stopped, 0 zombie
    Cpu0 : 5.2% us, 2.8% sy, 0.0% ni, 63.7% id, 23.4% wa, 1.2% hi, 3.8% si
    Cpu1 : 1.2% us, 3.2% sy, 0.0% ni, 65.9% id, 27.3% wa, 1.0% hi, 1.4% si
    Mem: 8169712k total, 8149512k used, 20200k free, 141388k buffers
    Swap: 4194296k total, 160k used, 4194136k free, 6388856k cached


    $ vmstat -n 5 5
    procs -----------memory---------- ---swap-- -----io---- --system--
    ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy
    id wa
    0 0 160 18712 155060 6383956 0 0 96 45 42 70 0
    2 89 9
    0 0 160 20128 154328 6382988 0 0 421 2578 7622 2433 3
    4 64 29
    0 0 160 18192 153920 6384076 0 0 126 2498 7116 2238 3
    6 72 19
    0 1 160 22872 153684 6380640 0 0 110 2451 7065 2063 3
    4 64 28
    0 0 160 23880 153416 6379752 0 0 34 2520 7091 2506 3
    4 68 25
  • Ugo Bellavance at Jan 7, 2008 at 3:39 pm

    Jed Reynolds wrote:
    Jed Reynolds wrote:
    Ugo Bellavance wrote:
    Can you send us the output of vmstat -n 5 5
    when you're doing a backup?
    This is with rsync at bwlimit%00
    This is doing the same transfer with SSH. The load still climbs...and
    then load drops. I think NFS is the issue.

    I wonder if my NFS connection settings in client fstabs are unwise? I
    figured with beefy machine and fast networking, I could take advantage
    of large packetsizes. Bad packet sizes?

    Are you backing up nfs to nfs? From where to where are you doing backups?
  • Jed Reynolds at Jan 7, 2008 at 4:07 pm

    Ugo Bellavance wrote:
    Jed Reynolds wrote:
    Jed Reynolds wrote:
    Ugo Bellavance wrote:
    Can you send us the output of vmstat -n 5 5
    when you're doing a backup?
    This is with rsync at bwlimit%00
    This is doing the same transfer with SSH. The load still climbs...and
    then load drops. I think NFS is the issue.

    I wonder if my NFS connection settings in client fstabs are unwise? I
    figured with beefy machine and fast networking, I could take
    advantage of large packetsizes. Bad packet sizes?

    Are you backing up nfs to nfs? From where to where are you doing
    backups?
    The source data is on a ext3 partition, LVM volume, backed by a 15krpm
    raid 10 volume. Both rsyncs where conducted from the source host (the db
    server) to the backup server (which hosts nfs). In the nfs backup, I was
    rsyncing from the db filesystem to an NFS mount, and the ssh backup, I
    was rsyncing from the db filesystem to [email protected]:/backups.

    Jed
  • Ugo Bellavance at Jan 7, 2008 at 4:42 pm

    Jed Reynolds wrote:
    Ugo Bellavance wrote:
    Jed Reynolds wrote:
    Jed Reynolds wrote:
    Ugo Bellavance wrote:
    Can you send us the output of vmstat -n 5 5
    when you're doing a backup?
    This is with rsync at bwlimit%00
    This is doing the same transfer with SSH. The load still climbs...and
    then load drops. I think NFS is the issue.

    I wonder if my NFS connection settings in client fstabs are unwise? I
    figured with beefy machine and fast networking, I could take
    advantage of large packetsizes. Bad packet sizes?

    Are you backing up nfs to nfs? From where to where are you doing
    backups?
    The source data is on a ext3 partition, LVM volume, backed by a 15krpm
    raid 10 volume. Both rsyncs where conducted from the source host (the db
    server) to the backup server (which hosts nfs). In the nfs backup, I was
    rsyncing from the db filesystem to an NFS mount, and the ssh backup, I
    was rsyncing from the db filesystem to [email protected]:/backups.

    Jed
    Ok, so the DB is residing locally on the local server you are copying
    onto the nfs mount only during backup?
  • Jed Reynolds at Jan 7, 2008 at 6:00 pm

    Ugo Bellavance wrote:
    Are you backing up nfs to nfs? From where to where are you doing
    backups?
    The source data is on a ext3 partition, LVM volume, backed by a
    15krpm raid 10 volume. Both rsyncs where conducted from the source
    host (the db server) to the backup server (which hosts nfs). In the
    nfs backup, I was rsyncing from the db filesystem to an NFS mount,
    and the ssh backup, I was rsyncing from the db filesystem to
    [email protected]:/backups.
    Ok, so the DB is residing locally on the local server you are copying
    onto the nfs mount only during backup?
    correct

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcentos @
categoriescentos
postedJan 6, '08 at 4:39p
activeJan 7, '08 at 6:00p
posts8
users2
websitecentos.org
irc#centos

2 users in discussion

Jed Reynolds: 5 posts Ugo Bellavance: 3 posts

People

Translate

site design / logo © 2023 Grokbase