FAQ
Hi,
up until now I've always deployed VMs with their storage located directly
on the host system but as the number of VMs grows and the hardware becomes
more powerful and can handle more virtual machines I'm concerned about a
failure of the host taking down too many VMs in one go.
As a result I'm now looking at moving to an infrastructure that uses shared
storage instead so I can live-migrate VMs or restart them quickly on
another host if the one they are running on dies.
The problem is that I'm not sure how to go about this bandwidth-wise.
What I'm aiming for as a starting point is a 3-4 host cluster with about 10
VMs on each host and a 2 system DRBD based cluster as a redundant storage
backend.
The question that bugs me is how I can get enough bandwidth between the
hosts and the storage to provide the VMs with reasonable I/O performance.
If all the 40 VMs start copying files at the same time that would mean that
the bandwidth share for each VM would be tiny.
Granted this is a worst case scenario and that's why I want to ask if
someone in here has experience with such a setup, can give recommendations
or comment on alternative setups? Would I maybe get away with 4 bonded gbit
ethernet ports? Would I require fiber channel or 10gbit infrastructure?

Regards,
Dennis

PS: The sheepdog project (http://www.osrg.net/sheepdog/) looks interesting
in that regard but apparently still is far from production-ready.

Search Discussions

  • Christopher G. Stach II at Mar 2, 2010 at 3:34 am

    ----- "Dennis J." wrote:

    What I'm aiming for as a starting point is a 3-4 host cluster with
    about 10 VMs on each host and a 2 system DRBD based cluster as a
    redundant storage backend.
    That's a good idea.
    The question that bugs me is how I can get enough bandwidth between the
    hosts and the storage to provide the VMs with reasonable I/O
    performance.
    You may also want to investigate whether or not a criss-cross replication setup (1A->2a, 2B->1b) is worth the complexity to you. That will spread the load across two drbd hosts and give you approximately the same fault tolerance at a slightly higher risk. (This is assuming that risk-performance tradeoff is important enough to your project.)
    If all the 40 VMs start copying files at the same time that would mean
    that the bandwidth share for each VM would be tiny.
    Would they? It's a possibility, and fun to think about, but what are the chances? You will usually run into this with backups, cron, and other scheduled [non-business load] tasks. These are far cheaper to fix with manually adjusting schedules than any other way, unless you are rolling in dough.
    Would I maybe get away with 4 bonded gbit ethernet ports? Would I
    require fiber channel or 10gbit infrastructure?
    Fuck FC, unless you want to get some out of date, used, gently broken, or no-name stuff, or at least until FCoE comes out. (You're probably better off getting unmanaged IB switches and using iSER.)

    Can't say if 10GbE would even be enough, but it's probably overkill. Just add up the PCI(-whatever) bus speeds of your hosts, benchmark your current load or realistically estimate what sort of 95th percentile loads you would have across the board, multiply by that percentage, and fudge that result for SLAs and whatnot. Maybe go ahead and do some FMEA and see if losing a host or two is going to peak the others over that bandwidth. If you find that 10GbE may be necessary, a lot of mobos and SuperMicro have a better price per port for DDR IB (maybe QDR now) and that may save you some money. Again, probably overkill. Check your math. :)

    Definitely use bonding. Definitely make sure you aren't going to saturate the bus that card (or cards, if you are worried about losing an entire adapter) is plugged into. If you're paranoid, get switches that can do bonding across supervisors or across physical fixed configuration switches. If you can't afford those, you may want to opt for 2Nx2N bonding-bridging. That would limit you to probably two 4-1GbE cards per host, just for your SAN, but that's probably plenty. Don't waste your money on iSCSI adapters. Just get ones with TOEs.

    --
    Christopher G. Stach II
    http://ldsys.net/~cgs/
  • Drew at Mar 2, 2010 at 4:27 am
    Don't waste your money on iSCSI adapters. Just get ones with TOEs.
    Just a point of note, if your hypervisor is derived from Linux
    (excluding some vendors who may have hacked in support), the TOEs (TCP
    Offload Engine) functions are *not* supported in Linux.


    --
    Drew

    "Nothing in life is to be feared. It is only to be understood."
    --Marie Curie
  • Pasi Kärkkäinen at Mar 3, 2010 at 9:41 am

    On Mon, Mar 01, 2010 at 09:34:45PM -0600, Christopher G. Stach II wrote:
    ----- "Dennis J." wrote:
    What I'm aiming for as a starting point is a 3-4 host cluster with
    about 10 VMs on each host and a 2 system DRBD based cluster as a
    redundant storage backend.
    That's a good idea.
    The question that bugs me is how I can get enough bandwidth between the
    hosts and the storage to provide the VMs with reasonable I/O
    performance.
    You may also want to investigate whether or not a criss-cross replication setup (1A->2a, 2B->1b) is worth the complexity to you. That will spread the load across two drbd hosts and give you approximately the same fault tolerance at a slightly higher risk. (This is assuming that risk-performance tradeoff is important enough to your project.)
    If all the 40 VMs start copying files at the same time that would mean
    that the bandwidth share for each VM would be tiny.
    Would they? It's a possibility, and fun to think about, but what are the chances? You will usually run into this with backups, cron, and other scheduled [non-business load] tasks. These are far cheaper to fix with manually adjusting schedules than any other way, unless you are rolling in dough.
    Would I maybe get away with 4 bonded gbit ethernet ports? Would I
    require fiber channel or 10gbit infrastructure?
    Fuck FC, unless you want to get some out of date, used, gently broken, or no-name stuff, or at least until FCoE comes out. (You're probably better off getting unmanaged IB switches and using iSER.)

    Can't say if 10GbE would even be enough, but it's probably overkill.
    10 Gbit Ethernet makes sense if you need over 110MB/sec throughput with
    sequential reads/writes with large block sizes.. that's what 1 Gbit ethernet
    can give you.

    If we're talking about random IO, then 1 Gbit ethernet is good/enough
    for many environments.

    Disks are the bottleneck with random IO.

    -- Pasi
  • Benjamin Franz at Mar 3, 2010 at 2:34 pm

    Pasi K?rkk?inen wrote:
    10 Gbit Ethernet makes sense if you need over 110MB/sec throughput with
    sequential reads/writes with large block sizes.. that's what 1 Gbit ethernet
    can give you.
    You can also bond 1Ge ports to get higher throughput. Buying an ethernet
    switch that supports bonded ports is quite a bit cheaper than 10Ge if
    you don't need to go all the way to 10G while giving substantial
    performance boosts.

    --
    Benjamin Franz
  • Grant McWilliams at Mar 3, 2010 at 4:29 pm

    If all the 40 VMs start copying files at the same time that would mean
    that the bandwidth share for each VM would be tiny.
    Would they? It's a possibility, and fun to think about, but what are the
    chances? You will usually run into this with backups, cron, and other
    scheduled [non-business load] tasks. These are far cheaper to fix with
    manually adjusting schedules than any other way, unless you are rolling in
    dough.

    I have a classroom environment where every VM is always doing the same thing
    in step ie. formatting partitions, installing software etc.. We hit the disk
    like a bunch of crazy people. I'm replacing my setup with three Intel SSDs
    in a RAID0 with either iSCSI or ATAoE. The RAID0 will be synced to a disk
    based storage as backup. We'll see pretty soon how many concurrent disk
    based operations this setup can handle.

    I'll be bonding 3 or 4 of the iSCSI box ethernet cards and then going from
    there to see what each of the servers in the cloud needs as far as their
    connection.

    Grant McWilliams

    Some people, when confronted with a problem, think "I know, I'll use
    Windows."
    Now they have two problems.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://lists.centos.org/pipermail/centos-virt/attachments/20100303/cedf158b/attachment.html
  • Christopher G. Stach II at Mar 7, 2010 at 9:00 am

    ----- "Grant McWilliams" wrote:

    I'm replacing my setup with
    three Intel SSDs in a RAID0 with either iSCSI or ATAoE. The RAID0 will
    be synced to a disk based storage as backup. We'll see pretty soon how
    many concurrent disk based operations this setup can handle.
    I haven't benchmarked anything like that in a while. I'm not saying that RAID 0 with 3 targets is going to be non-performant, but I would expect a parallel array to be better for random ops unless a classroom workload falls into sequential for some reason (software installation). Do you have any numbers testing this, or better, real world stats?

    --
    Christopher G. Stach II
    http://ldsys.net/~cgs/
  • Dennis J. at Mar 7, 2010 at 1:58 pm

    On 03/02/2010 04:51 AM, Ask Bj?rn Hansen wrote:
    On Mar 1, 2010, at 18:56, Dennis J. wrote:

    The question that bugs me is how I can get enough bandwidth between the
    hosts and the storage to provide the VMs with reasonable I/O performance.
    If all the 40 VMs start copying files at the same time that would mean that
    the bandwidth share for each VM would be tiny.
    It really depends on the specific workloads. In my experience it's generally the number of IOs per second rather than the bandwidth that's the limiting factor.

    We have a bunch of 4-disk boxes with md raid10 and we generally run out of disk IO before we run out of memory (~24-48GB) or CPU (dual quad core 2.26GHz or some such).
    That's very similar to what we are experiencing. The primary Problem for me
    is how to deal with the bottleneck of a shared storage setup. The most
    simple setup is a 2-system criss-cross setup where the two hosts also serve
    as halves for a DRBD cluster. The advantage of this approach is that it's a
    cheap solution, that only a part of the storage-traffic has to go over the
    network between the machines and that the network only hast to handle the
    sorage-traffic of the VMs of those two machines.
    The disadvantage of that approach is that you have to keep 50% of potential
    server capacity free in case of a failure of the twin node. That's quite a
    lot of wasted capacity.
    To reduce that problem you can increase the number of hosts to let say for
    an example four which would reduce the spare capacity needed to 33% on each
    system but then you really need to separate the storage from the hosts an
    now you have a bottleneck on the storage end. Increase the number of hosts
    to 8 and you get even less wasted capacity but also increase the pressure
    on the storage bottleneck a lot.
    Since I'm new to the whole SAN aspect I'm currently just looking at all the
    options that are out there and basically wonder how the big boys are
    handling this who have hundreds if not thousand of VMs running and need to
    be able to deal with physical failures too.
    That is why I find the sheepdog project so interesting because it seems to
    address this particular problem in a way that would provide almost linear
    scalability without actually using a SAN at all (well, at least not in the
    traditional sense of the word).

    Regards,
    Dennis

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcentos-virt @
categoriescentos
postedMar 2, '10 at 2:56a
activeMar 7, '10 at 1:58p
posts8
users6
websitecentos.org
irc#centos

People

Translate

site design / logo © 2022 Grokbase