FAQ

[Catalyst] Clustering catalyst apps

Gert Burger
May 8, 2006 at 11:41 am
Hi

I was wondering a few days ago how one would create a cluster of
catalyst webapps?

Some of my early thoughts including just having multiple machines
running apache with a load balancer.

But you then still have a single point of failure, at the load balancer.


Another problem is, if you use some sort of database to store your
sessions etc then you have another point of failure.

Therefore, how can a average small company improve their
(Catalyst)webapps reliability without breaking the budget?

Gert Burger
reply

Search Discussions

30 responses

  • Peter Edwards at May 8, 2006 at 12:26 pm
    Set up the DNS for your application to map to multiple IP addresses, one
    each for however many web server machines you need. Run your perl apps on
    those.
    Have a single database server machine with RAID mirrored disks. Have your
    perl apps connect to that.
    Regularly backup your database across the net to a disaster recovery (DR)
    machine at a different physical location. With mysql you can do a hotcopy
    then rsync the files across. Set up the DR server so it can also be a web
    server.

    Failures:
    Disk - switch to mirror until you can replace the disk
    Database host - switch your web apps to the DR server for database access;
    have an application strategy on what to do with delayed transactions that
    happened since the last database synchronisation [1]
    Network/Datacentre - point DNS to DR server and use its web server (poor
    performance, but at least limited access is available)
    Assuming you've got your servers in a data centre with triple connections to
    the Internet backbone, this last scenario is very unlikely.

    A lot depends on how many users, how critical up-time is, what the cost
    equation is between having an alternative site and hardware versus the
    opportunity cost of lost sales and damaged reputation. The above works well
    for 10-150 concurrent users. For more you could consider using the
    clustering and failover features that come with some databases.

    [1] For example, if you manage to recover the transaction log from the main
    db server you can merge the records in later provided your app hasn't
    allocated overlapping unique ids to its record keys.

    Regards, Peter

    -----Original Message-----
    From: catalyst-bounces at lists.rawmode.org
    [mailto:catalyst-bounces at lists.rawmode.org] On Behalf Of Gert Burger
    Sent: 08 May 2006 12:41
    To: The elegant MVC web framework
    Subject: [Catalyst] Clustering catalyst apps

    Hi

    I was wondering a few days ago how one would create a cluster of
    catalyst webapps?

    Some of my early thoughts including just having multiple machines
    running apache with a load balancer.

    But you then still have a single point of failure, at the load balancer.


    Another problem is, if you use some sort of database to store your
    sessions etc then you have another point of failure.

    Therefore, how can a average small company improve their
    (Catalyst)webapps reliability without breaking the budget?

    Gert Burger


    _______________________________________________
    Catalyst mailing list
    Catalyst at lists.rawmode.org
    http://lists.rawmode.org/mailman/listinfo/catalyst
  • Peter Edwards at May 8, 2006 at 12:27 pm
    (I've put some more linebreaks in this time)

    Set up the DNS for your application to map to multiple IP addresses, one
    each for however many web server machines you need. Run your perl apps on
    those.

    Have a single database server machine with RAID mirrored disks. Have your
    perl apps connect to that.

    Regularly backup your database across the net to a disaster recovery (DR)
    machine at a different physical location. With mysql you can do a hotcopy
    then rsync the files across. Set up the DR server so it can also be a web
    server.

    Failures:

    Disk - switch to mirror until you can replace the disk

    Database host - switch your web apps to the DR server for database access;
    have an application strategy on what to do with delayed transactions that
    happened since the last database synchronisation [1]

    Network/Datacentre - point DNS to DR server and use its web server (poor
    performance, but at least limited access is available)

    Assuming you've got your servers in a data centre with triple connections to
    the Internet backbone, this last scenario is very unlikely.

    A lot depends on how many users, how critical up-time is, what the cost
    equation is between having an alternative site and hardware versus the
    opportunity cost of lost sales and damaged reputation. The above works well
    for 10-150 concurrent users. For more you could consider using the
    clustering and failover features that come with some databases.

    [1] For example, if you manage to recover the transaction log from the main
    db server you can merge the records in later provided your app hasn't
    allocated overlapping unique ids to its record keys.

    Regards, Peter

    -----Original Message-----
    From: catalyst-bounces at lists.rawmode.org
    [mailto:catalyst-bounces at lists.rawmode.org] On Behalf Of Gert Burger
    Sent: 08 May 2006 12:41
    To: The elegant MVC web framework
    Subject: [Catalyst] Clustering catalyst apps

    Hi

    I was wondering a few days ago how one would create a cluster of
    catalyst webapps?

    Some of my early thoughts including just having multiple machines
    running apache with a load balancer.

    But you then still have a single point of failure, at the load balancer.


    Another problem is, if you use some sort of database to store your
    sessions etc then you have another point of failure.

    Therefore, how can a average small company improve their
    (Catalyst)webapps reliability without breaking the budget?

    Gert Burger


    _______________________________________________
    Catalyst mailing list
    Catalyst at lists.rawmode.org
    http://lists.rawmode.org/mailman/listinfo/catalyst
  • Peter Edwards at May 8, 2006 at 12:30 pm
    (I've put some more linebreaks in this time)

    Set up the DNS for your application to map to multiple IP addresses, one
    each for however many web server machines you need. Run your perl apps on
    those.

    Have a single database server machine with RAID mirrored disks. Have your
    perl apps connect to that.

    Regularly backup your database across the net to a disaster recovery (DR)
    machine at a different physical location. With mysql you can do a hotcopy
    then rsync the files across. Set up the DR server so it can also be a web
    server.

    Failures:

    Disk - switch to mirror until you can replace the disk

    Database host - switch your web apps to the DR server for database access;
    have an application strategy on what to do with delayed transactions that
    happened since the last database synchronisation [1]

    Network/Datacentre - point DNS to DR server and use its web server (poor
    performance, but at least limited access is available)

    Assuming you've got your servers in a data centre with triple connections to
    the Internet backbone, this last scenario is very unlikely.

    A lot depends on how many users, how critical up-time is, what the cost
    equation is between having an alternative site and hardware versus the
    opportunity cost of lost sales and damaged reputation. The above works well
    for 10-150 concurrent users. For more you could consider using the
    clustering and failover features that come with some databases.

    [1] For example, if you manage to recover the transaction log from the main
    db server you can merge the records in later provided your app hasn't
    allocated overlapping unique ids to its record keys.

    Regards, Peter

    -----Original Message-----
    From: catalyst-bounces at lists.rawmode.org
    [mailto:catalyst-bounces at lists.rawmode.org] On Behalf Of Gert Burger
    Sent: 08 May 2006 12:41
    To: The elegant MVC web framework
    Subject: [Catalyst] Clustering catalyst apps

    Hi

    I was wondering a few days ago how one would create a cluster of
    catalyst webapps?

    Some of my early thoughts including just having multiple machines
    running apache with a load balancer.

    But you then still have a single point of failure, at the load balancer.


    Another problem is, if you use some sort of database to store your
    sessions etc then you have another point of failure.

    Therefore, how can a average small company improve their
    (Catalyst)webapps reliability without breaking the budget?

    Gert Burger


    _______________________________________________
    Catalyst mailing list
    Catalyst at lists.rawmode.org
    http://lists.rawmode.org/mailman/listinfo/catalyst
  • Gert Burger at May 8, 2006 at 12:45 pm
    Thanks for the reply, here are some of my comments on this:

    Using round robin dns still means that if 50% of the servers are down,
    50% of all queries will goto the broken machines. Which will piss of
    half your customers.

    I have looked at the High Availability systems that have been written
    for linux and they provide doubles(Or more) of everything, from load
    balancers to db servers. The issue I have with them are they require a
    great deal of money in hardware to get running.

    Anycase, back to my issue, How do websites like slashdot and amazon, all
    which use perl, keep uptimes of close to 99.999% ?

    And is it possible to get to that level with lots of crappy hardware?

    Cheers

    PS. Excuse me for meddling with the semi-impossible.
    On Mon, 2006-05-08 at 13:30 +0100, Peter Edwards wrote:
    (I've put some more linebreaks in this time)

    Set up the DNS for your application to map to multiple IP addresses, one
    each for however many web server machines you need. Run your perl apps on
    those.
    Have a single database server machine with RAID mirrored disks. Have your
    perl apps connect to that.

    Regularly backup your database across the net to a disaster recovery (DR)
    machine at a different physical location. With mysql you can do a hotcopy
    then rsync the files across. Set up the DR server so it can also be a web
    server.
    Failures:

    Disk - switch to mirror until you can replace the disk

    Database host - switch your web apps to the DR server for database access;
    have an application strategy on what to do with delayed transactions that
    happened since the last database synchronisation [1]

    Network/Datacentre - point DNS to DR server and use its web server (poor
    performance, but at least limited access is available)

    Assuming you've got your servers in a data centre with triple connections to
    the Internet backbone, this last scenario is very unlikely.

    A lot depends on how many users, how critical up-time is, what the cost
    equation is between having an alternative site and hardware versus the
    opportunity cost of lost sales and damaged reputation. The above works well
    for 10-150 concurrent users. For more you could consider using the
    clustering and failover features that come with some databases.

    [1] For example, if you manage to recover the transaction log from the main
    db server you can merge the records in later provided your app hasn't
    allocated overlapping unique ids to its record keys.

    Regards, Peter

    -----Original Message-----
    From: catalyst-bounces at lists.rawmode.org
    [mailto:catalyst-bounces at lists.rawmode.org] On Behalf Of Gert Burger
    Sent: 08 May 2006 12:41
    To: The elegant MVC web framework
    Subject: [Catalyst] Clustering catalyst apps

    Hi

    I was wondering a few days ago how one would create a cluster of
    catalyst webapps?

    Some of my early thoughts including just having multiple machines
    running apache with a load balancer.

    But you then still have a single point of failure, at the load balancer.


    Another problem is, if you use some sort of database to store your
    sessions etc then you have another point of failure.

    Therefore, how can a average small company improve their
    (Catalyst)webapps reliability without breaking the budget?

    Gert Burger


    _______________________________________________
    Catalyst mailing list
    Catalyst at lists.rawmode.org
    http://lists.rawmode.org/mailman/listinfo/catalyst



    _______________________________________________
    Catalyst mailing list
    Catalyst at lists.rawmode.org
    http://lists.rawmode.org/mailman/listinfo/catalyst
  • Joe Landman at May 8, 2006 at 12:55 pm

    Gert Burger wrote:
    Thanks for the reply, here are some of my comments on this:

    Using round robin dns still means that if 50% of the servers are down,
    50% of all queries will goto the broken machines. Which will piss of
    half your customers.
    Hmmm... with dns proxies like dnsmasq and friends, this should not be an
    issue.
    I have looked at the High Availability systems that have been written
    for linux and they provide doubles(Or more) of everything, from load
    balancers to db servers. The issue I have with them are they require a
    great deal of money in hardware to get running.
    If you want highly available systems, this will cost you.
    Anycase, back to my issue, How do websites like slashdot and amazon, all
    which use perl, keep uptimes of close to 99.999% ?
    Designs with no single points of failure. Whether they are highly
    available may be open to interpretation, but if you are going to stand
    up a resource for use where the cost of being down (either economic or
    equivalent cost) or the risk of unavailability is high, you are going to
    want to make sure you have no single points of failure anywhere in your
    process.
    And is it possible to get to that level with lots of crappy hardware?
    Heh. No.

    Crappy hardware is as its name implies.

    If you want highly reliable stuff, you are going to need to purchase
    non-crappy hardware. This doesn't mean expensive hardware, just don't
    buy the obvious crap. Lots of hardware out there is crappy. Dealing
    with such hardware is a nightmare. Would cost you less to throw it away
    in many cases and start with non-crappy hardware.

    You need to design with the thought that single or multiple failures
    will not take down everything. Also, you need to design for active
    monitoring, simple start/stop mechanisms, and related.

    A nice DB system is indicated, mysql/postgresql should be fine. We use
    SQLite3 for some of our stuff and shuttle the DB around, as it is small
    enough for us to do this with.

    Joe
    Cheers

    PS. Excuse me for meddling with the semi-impossible.
    On Mon, 2006-05-08 at 13:30 +0100, Peter Edwards wrote:
    (I've put some more linebreaks in this time)

    Set up the DNS for your application to map to multiple IP addresses, one
    each for however many web server machines you need. Run your perl apps on
    those.
    Have a single database server machine with RAID mirrored disks. Have your
    perl apps connect to that.

    Regularly backup your database across the net to a disaster recovery (DR)
    machine at a different physical location. With mysql you can do a hotcopy
    then rsync the files across. Set up the DR server so it can also be a web
    server.
    Failures:

    Disk - switch to mirror until you can replace the disk

    Database host - switch your web apps to the DR server for database access;
    have an application strategy on what to do with delayed transactions that
    happened since the last database synchronisation [1]

    Network/Datacentre - point DNS to DR server and use its web server (poor
    performance, but at least limited access is available)

    Assuming you've got your servers in a data centre with triple connections to
    the Internet backbone, this last scenario is very unlikely.

    A lot depends on how many users, how critical up-time is, what the cost
    equation is between having an alternative site and hardware versus the
    opportunity cost of lost sales and damaged reputation. The above works well
    for 10-150 concurrent users. For more you could consider using the
    clustering and failover features that come with some databases.

    [1] For example, if you manage to recover the transaction log from the main
    db server you can merge the records in later provided your app hasn't
    allocated overlapping unique ids to its record keys.

    Regards, Peter

    -----Original Message-----
    From: catalyst-bounces at lists.rawmode.org
    [mailto:catalyst-bounces at lists.rawmode.org] On Behalf Of Gert Burger
    Sent: 08 May 2006 12:41
    To: The elegant MVC web framework
    Subject: [Catalyst] Clustering catalyst apps

    Hi

    I was wondering a few days ago how one would create a cluster of
    catalyst webapps?

    Some of my early thoughts including just having multiple machines
    running apache with a load balancer.

    But you then still have a single point of failure, at the load balancer.


    Another problem is, if you use some sort of database to store your
    sessions etc then you have another point of failure.

    Therefore, how can a average small company improve their
    (Catalyst)webapps reliability without breaking the budget?

    Gert Burger


    _______________________________________________
    Catalyst mailing list
    Catalyst at lists.rawmode.org
    http://lists.rawmode.org/mailman/listinfo/catalyst



    _______________________________________________
    Catalyst mailing list
    Catalyst at lists.rawmode.org
    http://lists.rawmode.org/mailman/listinfo/catalyst

    _______________________________________________
    Catalyst mailing list
    Catalyst at lists.rawmode.org
    http://lists.rawmode.org/mailman/listinfo/catalyst
    --
    Joseph Landman, Ph.D
    Founder and CEO
    Scalable Informatics LLC,
    email: landman at scalableinformatics.com
    web : http://www.scalableinformatics.com
    phone: +1 734 786 8423
    fax : +1 734 786 8452
    cell : +1 734 612 4615
  • Aristotle Pagaltzis at May 8, 2006 at 12:58 pm

    * Gert Burger [2006-05-08 14:55]:
    Anycase, back to my issue, How do websites like slashdot and
    amazon, all which use perl, keep uptimes of close to 99.999% ?
    What does the fact that they use Perl have to do with their load
    balancing? It?s a red herring, isn?t it?
    And is it possible to get to that level with lots of crappy hardware?
    Might http://www.danga.com/perlbal/ help?

    Regards,
    --
    Aristotle Pagaltzis // <http://plasmasturm.org/>
  • Len Jaffe at May 8, 2006 at 12:59 pm

    --- Gert Burger wrote:

    Thanks for the reply, here are some of my comments
    on this:

    I have looked at the High Availability systems that
    have been written
    for linux and they provide doubles(Or more) of
    everything, from load
    balancers to db servers. The issue I have with them
    are they require a
    great deal of money in hardware to get running.

    Anycase, back to my issue, How do websites like
    slashdot and amazon, all
    which use perl, keep uptimes of close to 99.999% ?
    They use redundant hardware laod balancers to front
    end everything.

    What you spend to maintain uptime will depend on the
    cost of downtime?

    Len.
  • Perrin Harkins at May 8, 2006 at 3:09 pm

    On Mon, 2006-05-08 at 14:45 +0200, Gert Burger wrote:
    I have looked at the High Availability systems that have been written
    for linux and they provide doubles(Or more) of everything, from load
    balancers to db servers. The issue I have with them are they require a
    great deal of money in hardware to get running.
    You can't get high-availability for nothing.
    Anycase, back to my issue, How do websites like slashdot and amazon, all
    which use perl, keep uptimes of close to 99.999% ?

    And is it possible to get to that level with lots of crappy hardware?
    I don't think Slashdot can be considered highly available, but that's
    beside the point. Yahoo, Google, etc. all get high-availability on
    mostly cheap hardware, but they have the scale to buy lots of it and put
    a lot of effort into making it work. If your budget is relatively low,
    you will probably get more reliability by spending more on your key
    components (the database server, the load balancer), since you won't
    want to pay for lots of redundant hardware. In other words, if you
    aren't willing to buy doubles of everything, buy better hardware so it
    is less likely to fail.

    You can't expect miracles though -- real high availability is achieved
    by having redundant hardware, hiring skilled personnel, and repeatedly
    testing your failover plan. That's how the companies you mentioned do
    it.

    Slashdot doesn't need real high-availability so they have adopted a
    strategy that might be more applicable to you, i.e. better hardware but
    less of it. It's described here:
    http://slashdot.org/faq/tech.shtml#te050

    - Perrin
  • Peter Edwards at May 8, 2006 at 3:30 pm
    Hi Gert, I think the key here is "average small company".

    My gut feel is that two relatively cheap rented servers at different
    datacentres using an (admittedly) crude DNS approach is enough to run most
    small companies' web services reliably and cheaply.

    I'm not sure if you need more. The sort of questions I'd ask about your
    customer are:
    How much money are they willing to spend per month?
    Is it online retail or chat-based or some other service?
    How many transactions per hour are they handling?
    Is timeliness critical?
    Do they handle a few big customers who need perfect service, or many smaller
    customers where they can afford to lose a few due to downtime?

    Assuming you do need more, there are a couple of aspects
    1) Scalability
    2) Reliability

    Scalability.
    The model I suggested lets you scale up to about 150 concurrent users. Most
    small companies would be delighted to have that many :)

    Reliability.
    High availability doesn't necessarily mean that you have to have 100% of
    your system's functionality on-line immediately. For example, if you still
    have the web pages coming up, can see contact names and phone numbers, but
    maybe it takes 30 minutes for the latest orders to reappear (via your
    recovery process) then for many users that is going to be acceptable and not
    considered an outage in service.
    If you run a watchdog process to flip the DNS on failure and initiate a DR
    process (or use something like perlbal that Aristotle suggested) you can
    have a high *perceived* uptime approaching what your customer asked for.
    Write the SLA in your support contract carefully and that might be enough.

    To achieve a real 99.99% uptime with 100% functionality is going to cost you
    a lot more... design time, testing, hardware, network, database licences,
    monitoring, support staff.

    Put it another way, are you selling them a "Rolls Royce" solution or a
    diesel van? I know which most small businesses are going to go for. Of
    course they'd love a Rolls Royce - as long as you pay for it - but all their
    similar-sized competitors are driving diesel vans.

    I'm not suggesting you skimp on remote monitoring, or the use of a TPM if
    you really need it, just that a combination of KISS and customer expectation
    management will save you money and trouble.

    Regards, Peter
    www.dragonstaff.com

    -----Original Message-----
    From: catalyst-bounces at lists.rawmode.org
    [mailto:catalyst-bounces at lists.rawmode.org] On Behalf Of Gert Burger
    Sent: 08 May 2006 13:45
    To: The elegant MVC web framework
    Subject: Re: [Catalyst] FW: Clustering catalyst apps

    Thanks for the reply, here are some of my comments on this:

    Using round robin dns still means that if 50% of the servers are down,
    50% of all queries will goto the broken machines. Which will piss of
    half your customers.

    I have looked at the High Availability systems that have been written
    for linux and they provide doubles(Or more) of everything, from load
    balancers to db servers. The issue I have with them are they require a
    great deal of money in hardware to get running.

    Anycase, back to my issue, How do websites like slashdot and amazon, all
    which use perl, keep uptimes of close to 99.999% ?

    And is it possible to get to that level with lots of crappy hardware?

    Cheers

    PS. Excuse me for meddling with the semi-impossible.
    On Mon, 2006-05-08 at 13:30 +0100, Peter Edwards wrote:
    (I've put some more linebreaks in this time)

    Set up the DNS for your application to map to multiple IP addresses, one
    each for however many web server machines you need. Run your perl apps on
    those.
    Have a single database server machine with RAID mirrored disks. Have your
    perl apps connect to that.

    Regularly backup your database across the net to a disaster recovery (DR)
    machine at a different physical location. With mysql you can do a hotcopy
    then rsync the files across. Set up the DR server so it can also be a web
    server.
    Failures:

    Disk - switch to mirror until you can replace the disk

    Database host - switch your web apps to the DR server for database access;
    have an application strategy on what to do with delayed transactions that
    happened since the last database synchronisation [1]

    Network/Datacentre - point DNS to DR server and use its web server (poor
    performance, but at least limited access is available)

    Assuming you've got your servers in a data centre with triple connections to
    the Internet backbone, this last scenario is very unlikely.

    A lot depends on how many users, how critical up-time is, what the cost
    equation is between having an alternative site and hardware versus the
    opportunity cost of lost sales and damaged reputation. The above works well
    for 10-150 concurrent users. For more you could consider using the
    clustering and failover features that come with some databases.

    [1] For example, if you manage to recover the transaction log from the main
    db server you can merge the records in later provided your app hasn't
    allocated overlapping unique ids to its record keys.

    Regards, Peter

    -----Original Message-----
    From: catalyst-bounces at lists.rawmode.org
    [mailto:catalyst-bounces at lists.rawmode.org] On Behalf Of Gert Burger
    Sent: 08 May 2006 12:41
    To: The elegant MVC web framework
    Subject: [Catalyst] Clustering catalyst apps

    Hi

    I was wondering a few days ago how one would create a cluster of
    catalyst webapps?

    Some of my early thoughts including just having multiple machines
    running apache with a load balancer.

    But you then still have a single point of failure, at the load balancer.


    Another problem is, if you use some sort of database to store your
    sessions etc then you have another point of failure.

    Therefore, how can a average small company improve their
    (Catalyst)webapps reliability without breaking the budget?

    Gert Burger


    _______________________________________________
    Catalyst mailing list
    Catalyst at lists.rawmode.org
    http://lists.rawmode.org/mailman/listinfo/catalyst



    _______________________________________________
    Catalyst mailing list
    Catalyst at lists.rawmode.org
    http://lists.rawmode.org/mailman/listinfo/catalyst

    _______________________________________________
    Catalyst mailing list
    Catalyst at lists.rawmode.org
    http://lists.rawmode.org/mailman/listinfo/catalyst
  • Wade Stuart at May 8, 2006 at 6:06 pm

    catalyst-bounces at lists.rawmode.org wrote on 05/08/2006 10:30:03 AM:

    Hi Gert, I think the key here is "average small company".

    My gut feel is that two relatively cheap rented servers at different
    datacentres using an (admittedly) crude DNS approach is enough to run most
    small companies' web services reliably and cheaply.

    I'm not sure if you need more. The sort of questions I'd ask about your
    customer are:
    How much money are they willing to spend per month?
    Is it online retail or chat-based or some other service?
    How many transactions per hour are they handling?
    Is timeliness critical?
    Do they handle a few big customers who need perfect service, or many smaller
    customers where they can afford to lose a few due to downtime?

    Assuming you do need more, there are a couple of aspects
    1) Scalability
    2) Reliability


    Another thing you can do, and I have actually just seen this done first
    hand (sad)
    is figure out what the lowest percentage uptime that customers will not
    baulk at for an SLA.

    Lets say you choose 94% uptime.

    Now, figure in how many hours your environment will be down per week.

    68 hours?

    ok, that puts you way below 94% uptime. So now start removing hours from
    a 24x7 week.

    Start with the weekends -- they are expendable (who uses applications on
    the weekend) -- this gets you "free" 48 hours of downtime per week.

    94% on 24x5, But you are still way below your 94% uptime SLA number. Just
    calculate how many hours you need to lose from 24x5 to get to 94%. In this
    case about two per day.

    So put 94% uptime based on 22x5 in your SLA.

    Sounds good on paper, when you do the math looks like 61.3% uptime though
    on a 24x7 schedule.


    Of course I am being facetious, I have just had these exact numbers given
    to me for one of our vendors services -- gogo outsourcing.

    Wade
  • Len Jaffe at May 8, 2006 at 6:28 pm

    --- Wade.Stuart at fallon.com wrote:


    Of course I am being facetious, I have just had
    these exact numbers given
    to me for one of our vendors services -- gogo
    outsourcing.
    As in "GoGo outsource to another vendor."
  • Matt S Trout at May 8, 2006 at 6:16 pm

    Gert Burger wrote:
    Thanks for the reply, here are some of my comments on this:

    Using round robin dns still means that if 50% of the servers are down,
    50% of all queries will goto the broken machines. Which will piss of
    half your customers.

    I have looked at the High Availability systems that have been written
    for linux and they provide doubles(Or more) of everything, from load
    balancers to db servers. The issue I have with them are they require a
    great deal of money in hardware to get running.
    Not really, if you're buying a fair few at once people like Dell will do
    you pretty damn good deals on moderate 1u servers if you're buying a lot
    of them at once.

    In the end though, you're probably going to have a single point of
    failure somewhere - e.g. "we have two of every piece of kit but only one
    backup generator for when the UPSen run out", or at the very least
    "there's only one planet earth so if there's an extinction level
    asteroid strike ..." :)

    The point here is to push that single point of failure back as far as is
    cost-effective. For a small company, making sure your border router is
    bloody good and using two servers with a single service IP and failover
    is often good enough.
  • Dave C at May 8, 2006 at 8:14 pm

    On 5/8/06, Gert Burger wrote:
    Thanks for the reply, here are some of my comments on this:
    Disclaimer: I work for a large hosting company (shameless:
    http://www.hostway.com) and I specialize in designing highly available
    clusters for large customers using all Open Source, freely available
    software running on both (depending on the customer) "crappy" and
    non-crappy systems (we host parts of foxnews.com, orbitz, Wikipedia,
    and others).

    The key to offer the "five nines" availabilty (99.999%, or under 5
    minutes a year) is to examine faults in every aspect, including
    application, hardware, network, facility, and OS to identify single
    points of failure. Then, just design around them. Even down to such
    details as plugging servers into different power strips on separate
    phases (may seem obvious, but you'd be suprised what I've seen bring a
    cluster down), and using IP addresses located on different subnets,
    etc.

    On a larger scale, we happen to offer a global caching platform
    similar to Akamai built on pure Open Source software which will route
    around an entire data center going offline (we have ten different data
    centers).
    Using round robin dns still means that if 50% of the servers are down,
    50% of all queries will goto the broken machines. Which will piss of
    half your customers.
    Not necessarily. Both google.com and yahoo.com use RR DNS:

    host www.google.com
    www.google.com is an alias for www.l.google.com.
    www.l.google.com has address 64.233.161.99
    www.l.google.com has address 64.233.161.104
    www.l.google.com has address 64.233.161.14

    host www.yahoo.com
    www.yahoo.com is an alias for www.yahoo.akadns.net.
    www.yahoo.akadns.net has address 68.142.226.41
    www.yahoo.akadns.net has address 68.142.226.32
    www.yahoo.akadns.net has address 68.142.226.38
    www.yahoo.akadns.net has address 68.142.226.52
    www.yahoo.akadns.net has address 68.142.226.34
    www.yahoo.akadns.net has address 68.142.226.37
    www.yahoo.akadns.net has address 68.142.226.53
    www.yahoo.akadns.net has address 68.142.226.55

    However, they lower the TTL on the records to under 60 seconds, which
    allows for changes to be made quickly. Using monitoring software
    like nagios, monit, or your own using Test::WWW::Mechanize::Catalyst,
    one could connect to the application on each alias and if there is an
    error, yank that IP from DNS.
    Anycase, back to my issue, How do websites like slashdot and amazon, all
    which use perl, keep uptimes of close to 99.999% ?
    They use multiple layers of redundancy. As I outlined above, the
    first point would be RR DNS, then, each of the IPs returned are
    connected to some sort of load balancer (hardware possibly using
    BigIP, Foundry, or Cisco gear, software using LVS). There's some
    reverse proxying being done, connecting to query caches for database
    intensive work, then returning the request back to the client.

    For a good outline of how LiveJournal uses open source software for
    high availablity, check
    http://www.danga.com/words/2004_oscon/oscon2004.pdf
    And is it possible to get to that level with lots of crappy hardware?
    Yes, Google actually designs around this. They don't even use
    hardware RAID in their systems and are said to use commodity equipment
    costing roughly $1000/piece.
    http://www.internetnews.com/xSP/article.php/3487041

    dave.
  • Johan Lindström at May 8, 2006 at 9:12 pm

    At 22:14 2006-05-08, Dave C wrote:
    The key to offer the "five nines" availabilty (99.999%, or under 5
    minutes a year) is to examine faults in every aspect, including
    application, hardware, network, facility, and OS to identify single
    points of failure.
    If you go that far, don't forget to make sure your two independent ISPs
    really are independent and don't buy their upstream bandwidth from the same
    provider :)

    That happened to us a couple of years ago; the upstream provider had some
    downtime and we were mightily upset.


    /J
  • Matt S Trout at May 8, 2006 at 9:46 pm

    Johan Lindstr?m wrote:
    At 22:14 2006-05-08, Dave C wrote:
    The key to offer the "five nines" availabilty (99.999%, or under 5
    minutes a year) is to examine faults in every aspect, including
    application, hardware, network, facility, and OS to identify single
    points of failure.
    If you go that far, don't forget to make sure your two independent ISPs
    really are independent and don't buy their upstream bandwidth from the same
    provider :)

    That happened to us a couple of years ago; the upstream provider had some
    downtime and we were mightily upset.
    At $ork[-mumble], we had two links, both physically entirely separate
    (our BNetworkAdminFH had ensured that they even went out different sides
    of the building). Unfortunately, one time some bastards half-filled a
    bunch of wheely-bins full of petrol, lit it, waited a few seconds, then
    emptied the bins down carefully-chosen manholes over comms line
    intersections, reducing fibre-optic bundles to slag quickly and
    effectively. They got about 80% of the major intersections in the area,
    naturally including both our lines.

    Sometimes even not having a single point of failure won't save you.
  • Jules Agee at May 9, 2006 at 6:31 pm

    Dave C wrote:
    On a larger scale, we happen to offer a global caching platform
    similar to Akamai built on pure Open Source software which will route
    around an entire data center going offline (we have ten different data
    centers).
    Anyone used pound <http://www.apsis.ch/pound/>? Looks like a pretty
    interesting solution for inexpensive http reverse-proxy, failover, load
    balancing, ssl wrapper, etc.

    --
    Jules Agee
    System Administrator
    Pacific Coast Feather Co.
    julesa at pcf.com x284
  • Matt S Trout at May 9, 2006 at 6:44 pm

    Jules Agee wrote:
    Dave C wrote:
    On a larger scale, we happen to offer a global caching platform
    similar to Akamai built on pure Open Source software which will route
    around an entire data center going offline (we have ten different data
    centers).
    Anyone used pound <http://www.apsis.ch/pound/>? Looks like a pretty
    interesting solution for inexpensive http reverse-proxy, failover, load
    balancing, ssl wrapper, etc.
    Looks interesting, although I think for serious scaling I'd probably
    prefer perlbal due to its being somewhat smarter (and easier to extend).
    Probably worth trialling and benchmarking both, mind.
  • Michael Alan Dorman at May 10, 2006 at 2:12 pm

    Jules Agee <julesa at pcf.com> writes:
    Anyone used pound <http://www.apsis.ch/pound/>? Looks like a pretty
    interesting solution for inexpensive http reverse-proxy, failover, load
    balancing, ssl wrapper, etc.
    I've used pound on a moderate-traffic (2M hits/day) site for about the
    last three years. It is capable, simple to setup and reliable---I had
    to login to the proxy server to see when it had last been restarted
    (Jan 30, incidentally).

    I do wish it wasn't so ardent about spamming the logs about every
    dropped connection---on a site with any traffic, this happens a lot,
    so it's annoying.

    Mike
    --
    Give me a Leonard Cohen afterworld
  • Matt S Trout at May 10, 2006 at 3:00 pm

    Michael Alan Dorman wrote:
    Jules Agee <julesa at pcf.com> writes:
    Anyone used pound <http://www.apsis.ch/pound/>? Looks like a pretty
    interesting solution for inexpensive http reverse-proxy, failover, load
    balancing, ssl wrapper, etc.
    I've used pound on a moderate-traffic (2M hits/day) site for about the
    last three years. It is capable, simple to setup and reliable---I had
    to login to the proxy server to see when it had last been restarted
    (Jan 30, incidentally).

    I do wish it wasn't so ardent about spamming the logs about every
    dropped connection---on a site with any traffic, this happens a lot,
    so it's annoying.
    So patch it :D, it's open source after all.
  • Len Jaffe at May 10, 2006 at 3:09 pm

    --- Matt S Trout wrote:

    Michael Alan Dorman wrote:
    I do wish it wasn't so ardent about spamming the
    logs about every
    dropped connection---on a site with any traffic,
    this happens a lot,
    so it's annoying.
    So patch it :D, it's open source after all.
    Don't you know? You aren't really supposed to patch
    open source code. Only free software. If you patch
    open source code, the maintainers roll their eyes at
    you and sigh or mutter.
  • Matt S Trout at May 10, 2006 at 3:28 pm

    Len Jaffe wrote:
    Don't you know? You aren't really supposed to patch
    open source code. Only free software. If you patch
    open source code, the maintainers roll their eyes at
    you and sigh or mutter.
    I thought that was the unquiet dead that did that.

    Anyway, it's GPL, so it's free software too. *disappears in a puff of logic*
  • Dave Hodgkinson at May 10, 2006 at 3:35 pm

    On 10 May 2006, at 16:28, Matt S Trout wrote:

    Len Jaffe wrote:
    Don't you know? You aren't really supposed to patch
    open source code. Only free software. If you patch
    open source code, the maintainers roll their eyes at
    you and sigh or mutter.
    I thought that was the unquiet dead that did that.

    Anyway, it's GPL, so it's free software too. *disappears in a puff
    of logic*
    Given that it's *years* since I did a diff in anger, a quick recipe
    on how to submit patches would be welcome...

    --
    Dave Hodgkinson - Music photography
    http://www.hodgkinson.org/
  • Matt S Trout at May 10, 2006 at 5:11 pm

    Dave Hodgkinson wrote:
    On 10 May 2006, at 16:28, Matt S Trout wrote:

    Len Jaffe wrote:
    Don't you know? You aren't really supposed to patch
    open source code. Only free software. If you patch
    open source code, the maintainers roll their eyes at
    you and sigh or mutter.
    I thought that was the unquiet dead that did that.

    Anyway, it's GPL, so it's free software too. *disappears in a puff
    of logic*
    Given that it's *years* since I did a diff in anger, a quick recipe
    on how to submit patches would be welcome...
    If they have an svn repo I usually check it out, edit in place and send
    an svk diff

    If not, unpacking the tar, cp -pR ing it to -orig, and doing a diff -ur
    across the two dirs when done seems to be ok. However, a lot of authors
    are *incredibly* picky about what diff options you use, so I find you
    usually have to re-send at least once with some option you've never
    heard of added to the list :)
  • Len Jaffe at May 10, 2006 at 5:34 pm

    --- Matt S Trout wrote:


    However, a lot of authors
    are *incredibly* picky about what diff options you
    use, so I find you
    usually have to re-send at least once with some
    option you've never
    heard of added to the list :)
    See also: hand-wringing, gesticulating, speaking in tongues.

    Leonard A. Jaffe lenjaffe at jaffesystems.com
    Leonard Jaffe Computer Systems Consulting Ltd.
    Columbus, OH, USA 614-404-4214 F: 530-380-7423
  • Matt S Trout at May 10, 2006 at 5:39 pm

    Len Jaffe wrote:
    --- Matt S Trout wrote:

    However, a lot of authors
    are *incredibly* picky about what diff options you
    use, so I find you
    usually have to re-send at least once with some
    option you've never
    heard of added to the list :)
    See also: hand-wringing, gesticulating, speaking in tongues.
    And on one occasion the stunningly unexpected response of "it doesn't
    matter really, since I much prefer to apply changes by hand".
  • Wade Stuart at May 10, 2006 at 5:53 pm

    Len Jaffe wrote:
    --- Matt S Trout wrote:

    However, a lot of authors
    are *incredibly* picky about what diff options you
    use, so I find you
    usually have to re-send at least once with some
    option you've never
    heard of added to the list :)
    See also: hand-wringing, gesticulating, speaking in tongues.
    And on one occasion the stunningly unexpected response of "it doesn't
    matter really, since I much prefer to apply changes by hand".
    That would be a side effect of many, many rounds of the same patches to the
    list with different non-requested flags.

    I have seen more than one maintainer throw his arms up in frustration and
    just take the whole file or tar to generate his own diff/patch/hand
    changes.

    -Wade
  • Dave Hodgkinson at May 10, 2006 at 7:17 pm

    On 10 May 2006, at 18:11, Matt S Trout wrote:

    Dave Hodgkinson wrote:
    On 10 May 2006, at 16:28, Matt S Trout wrote:

    Len Jaffe wrote:
    Don't you know? You aren't really supposed to patch
    open source code. Only free software. If you patch
    open source code, the maintainers roll their eyes at
    you and sigh or mutter.
    I thought that was the unquiet dead that did that.

    Anyway, it's GPL, so it's free software too. *disappears in a puff
    of logic*
    Given that it's *years* since I did a diff in anger, a quick recipe
    on how to submit patches would be welcome...
    If they have an svn repo I usually check it out, edit in place and
    send
    an svk diff
    Now that's good to know.
    If not, unpacking the tar, cp -pR ing it to -orig, and doing a diff
    -ur
    across the two dirs when done seems to be ok. However, a lot of
    authors
    are *incredibly* picky about what diff options you use, so I find you
    usually have to re-send at least once with some option you've never
    heard of added to the list :)
    And some that they've made up...

    --
    Dave Hodgkinson - Music photography
    http://www.hodgkinson.org/
  • Fernan Aguero at May 10, 2006 at 5:19 pm
    +----[ Dave Hodgkinson (10.May.2006 12:41):
    Given that it's *years* since I did a diff in anger, a quick recipe
    on how to submit patches would be welcome...
    +----]

    cp file file.orig
    [edit file at leisure and save your changes]
    diff -u file.orig file > file.diff
    and send your patch (file.diff) as an attachment

    the usual recommendations apply: do not edit
    whitespace/tabs and minor (ie not important stuff). This
    will make your diff cleaner and thus easier to read.

    Fernan
  • Matt S Trout at May 10, 2006 at 5:38 pm

    Fernan Aguero wrote:
    +----[ Dave Hodgkinson (10.May.2006 12:41):
    Given that it's *years* since I did a diff in anger, a quick recipe
    on how to submit patches would be welcome...
    +----]

    cp file file.orig
    [edit file at leisure and save your changes]
    diff -u file.orig file > file.diff
    and send your patch (file.diff) as an attachment

    the usual recommendations apply: do not edit
    whitespace/tabs and minor (ie not important stuff). This
    will make your diff cleaner and thus easier to read.
    I'd really recommend doing a recursive diff on the entire try. makes
    things much easier for maints since they can just cd to the root of
    their checkout and do patch -p0 <rdiff.file
  • Roy-Magne Mo at May 8, 2006 at 1:31 pm

    m? den 08.05.2006 klokka 13:30 (+0100) skreiv Peter Edwards:
    (I've put some more linebreaks in this time)

    Set up the DNS for your application to map to multiple IP addresses, one
    each for however many web server machines you need. Run your perl apps on
    those.
    I do not agree with this, DNS load balancing is crude and leaves a lot
    up to implementation of the client.

    It all depends on the budget, but setting up two obsolete/cheap servers
    as a LVS (Linux Virtual Server) in front of you real servers, will bring
    you quite close to what you want. Design for failure of one or more
    nodes.

    You will of course also need to think about the network design and how
    much you are wanting to put in. A nice setup with quagga and BGP on the
    LVS nodes could possibly work well :)

    Test, test, test and test this setup before putting it into production.

    If you are running SSL, you also might wan't to offload the SSL to
    separate servers.

    If you are using MySQL, look into the new clustering options.

    Running a high volume site is always a continious process, this is
    probably just step 1.

    --
    Roy-Magne Mo <rmo at sunnmore.net>

Related Discussions