Hi
I was wondering a few days ago how one would create a cluster of
catalyst webapps?
Some of my early thoughts including just having multiple machines
running apache with a load balancer.
But you then still have a single point of failure, at the load balancer.
Another problem is, if you use some sort of database to store your
sessions etc then you have another point of failure.
Therefore, how can a average small company improve their
(Catalyst)webapps reliability without breaking the budget?
Gert Burger
[Catalyst] Clustering catalyst apps
| Tweet |
|
Search Discussions
-
Peter Edwards at May 8, 2006 at 12:26 pm ⇧
Set up the DNS for your application to map to multiple IP addresses, one
each for however many web server machines you need. Run your perl apps on
those.
Have a single database server machine with RAID mirrored disks. Have your
perl apps connect to that.
Regularly backup your database across the net to a disaster recovery (DR)
machine at a different physical location. With mysql you can do a hotcopy
then rsync the files across. Set up the DR server so it can also be a web
server.
Failures:
Disk - switch to mirror until you can replace the disk
Database host - switch your web apps to the DR server for database access;
have an application strategy on what to do with delayed transactions that
happened since the last database synchronisation [1]
Network/Datacentre - point DNS to DR server and use its web server (poor
performance, but at least limited access is available)
Assuming you've got your servers in a data centre with triple connections to
the Internet backbone, this last scenario is very unlikely.
A lot depends on how many users, how critical up-time is, what the cost
equation is between having an alternative site and hardware versus the
opportunity cost of lost sales and damaged reputation. The above works well
for 10-150 concurrent users. For more you could consider using the
clustering and failover features that come with some databases.
[1] For example, if you manage to recover the transaction log from the main
db server you can merge the records in later provided your app hasn't
allocated overlapping unique ids to its record keys.
Regards, Peter
-----Original Message-----
From: catalyst-bounces at lists.rawmode.org
[mailto:catalyst-bounces at lists.rawmode.org] On Behalf Of Gert Burger
Sent: 08 May 2006 12:41
To: The elegant MVC web framework
Subject: [Catalyst] Clustering catalyst apps
Hi
I was wondering a few days ago how one would create a cluster of
catalyst webapps?
Some of my early thoughts including just having multiple machines
running apache with a load balancer.
But you then still have a single point of failure, at the load balancer.
Another problem is, if you use some sort of database to store your
sessions etc then you have another point of failure.
Therefore, how can a average small company improve their
(Catalyst)webapps reliability without breaking the budget?
Gert Burger
_______________________________________________
Catalyst mailing list
Catalyst at lists.rawmode.org
http://lists.rawmode.org/mailman/listinfo/catalyst
-
Peter Edwards at May 8, 2006 at 12:27 pm ⇧
(I've put some more linebreaks in this time)
Set up the DNS for your application to map to multiple IP addresses, one
each for however many web server machines you need. Run your perl apps on
those.
Have a single database server machine with RAID mirrored disks. Have your
perl apps connect to that.
Regularly backup your database across the net to a disaster recovery (DR)
machine at a different physical location. With mysql you can do a hotcopy
then rsync the files across. Set up the DR server so it can also be a web
server.
Failures:
Disk - switch to mirror until you can replace the disk
Database host - switch your web apps to the DR server for database access;
have an application strategy on what to do with delayed transactions that
happened since the last database synchronisation [1]
Network/Datacentre - point DNS to DR server and use its web server (poor
performance, but at least limited access is available)
Assuming you've got your servers in a data centre with triple connections to
the Internet backbone, this last scenario is very unlikely.
A lot depends on how many users, how critical up-time is, what the cost
equation is between having an alternative site and hardware versus the
opportunity cost of lost sales and damaged reputation. The above works well
for 10-150 concurrent users. For more you could consider using the
clustering and failover features that come with some databases.
[1] For example, if you manage to recover the transaction log from the main
db server you can merge the records in later provided your app hasn't
allocated overlapping unique ids to its record keys.
Regards, Peter
-----Original Message-----
From: catalyst-bounces at lists.rawmode.org
[mailto:catalyst-bounces at lists.rawmode.org] On Behalf Of Gert Burger
Sent: 08 May 2006 12:41
To: The elegant MVC web framework
Subject: [Catalyst] Clustering catalyst apps
Hi
I was wondering a few days ago how one would create a cluster of
catalyst webapps?
Some of my early thoughts including just having multiple machines
running apache with a load balancer.
But you then still have a single point of failure, at the load balancer.
Another problem is, if you use some sort of database to store your
sessions etc then you have another point of failure.
Therefore, how can a average small company improve their
(Catalyst)webapps reliability without breaking the budget?
Gert Burger
_______________________________________________
Catalyst mailing list
Catalyst at lists.rawmode.org
http://lists.rawmode.org/mailman/listinfo/catalyst
-
Peter Edwards at May 8, 2006 at 12:30 pm ⇧
(I've put some more linebreaks in this time)
Set up the DNS for your application to map to multiple IP addresses, one
each for however many web server machines you need. Run your perl apps on
those.
Have a single database server machine with RAID mirrored disks. Have your
perl apps connect to that.
Regularly backup your database across the net to a disaster recovery (DR)
machine at a different physical location. With mysql you can do a hotcopy
then rsync the files across. Set up the DR server so it can also be a web
server.
Failures:
Disk - switch to mirror until you can replace the disk
Database host - switch your web apps to the DR server for database access;
have an application strategy on what to do with delayed transactions that
happened since the last database synchronisation [1]
Network/Datacentre - point DNS to DR server and use its web server (poor
performance, but at least limited access is available)
Assuming you've got your servers in a data centre with triple connections to
the Internet backbone, this last scenario is very unlikely.
A lot depends on how many users, how critical up-time is, what the cost
equation is between having an alternative site and hardware versus the
opportunity cost of lost sales and damaged reputation. The above works well
for 10-150 concurrent users. For more you could consider using the
clustering and failover features that come with some databases.
[1] For example, if you manage to recover the transaction log from the main
db server you can merge the records in later provided your app hasn't
allocated overlapping unique ids to its record keys.
Regards, Peter
-----Original Message-----
From: catalyst-bounces at lists.rawmode.org
[mailto:catalyst-bounces at lists.rawmode.org] On Behalf Of Gert Burger
Sent: 08 May 2006 12:41
To: The elegant MVC web framework
Subject: [Catalyst] Clustering catalyst apps
Hi
I was wondering a few days ago how one would create a cluster of
catalyst webapps?
Some of my early thoughts including just having multiple machines
running apache with a load balancer.
But you then still have a single point of failure, at the load balancer.
Another problem is, if you use some sort of database to store your
sessions etc then you have another point of failure.
Therefore, how can a average small company improve their
(Catalyst)webapps reliability without breaking the budget?
Gert Burger
_______________________________________________
Catalyst mailing list
Catalyst at lists.rawmode.org
http://lists.rawmode.org/mailman/listinfo/catalyst
-
Gert Burger at May 8, 2006 at 12:45 pm ⇧
Thanks for the reply, here are some of my comments on this:
Using round robin dns still means that if 50% of the servers are down,
50% of all queries will goto the broken machines. Which will piss of
half your customers.
I have looked at the High Availability systems that have been written
for linux and they provide doubles(Or more) of everything, from load
balancers to db servers. The issue I have with them are they require a
great deal of money in hardware to get running.
Anycase, back to my issue, How do websites like slashdot and amazon, all
which use perl, keep uptimes of close to 99.999% ?
And is it possible to get to that level with lots of crappy hardware?
Cheers
PS. Excuse me for meddling with the semi-impossible.On Mon, 2006-05-08 at 13:30 +0100, Peter Edwards wrote:
(I've put some more linebreaks in this time)
Set up the DNS for your application to map to multiple IP addresses, one
each for however many web server machines you need. Run your perl apps on
those.
Have a single database server machine with RAID mirrored disks. Have your
perl apps connect to that.
Regularly backup your database across the net to a disaster recovery (DR)
machine at a different physical location. With mysql you can do a hotcopy
then rsync the files across. Set up the DR server so it can also be a web
server.
Failures:
Disk - switch to mirror until you can replace the disk
Database host - switch your web apps to the DR server for database access;
have an application strategy on what to do with delayed transactions that
happened since the last database synchronisation [1]
Network/Datacentre - point DNS to DR server and use its web server (poor
performance, but at least limited access is available)
Assuming you've got your servers in a data centre with triple connections to
the Internet backbone, this last scenario is very unlikely.
A lot depends on how many users, how critical up-time is, what the cost
equation is between having an alternative site and hardware versus the
opportunity cost of lost sales and damaged reputation. The above works well
for 10-150 concurrent users. For more you could consider using the
clustering and failover features that come with some databases.
[1] For example, if you manage to recover the transaction log from the main
db server you can merge the records in later provided your app hasn't
allocated overlapping unique ids to its record keys.
Regards, Peter
-----Original Message-----
From: catalyst-bounces at lists.rawmode.org
[mailto:catalyst-bounces at lists.rawmode.org] On Behalf Of Gert Burger
Sent: 08 May 2006 12:41
To: The elegant MVC web framework
Subject: [Catalyst] Clustering catalyst apps
Hi
I was wondering a few days ago how one would create a cluster of
catalyst webapps?
Some of my early thoughts including just having multiple machines
running apache with a load balancer.
But you then still have a single point of failure, at the load balancer.
Another problem is, if you use some sort of database to store your
sessions etc then you have another point of failure.
Therefore, how can a average small company improve their
(Catalyst)webapps reliability without breaking the budget?
Gert Burger
_______________________________________________
Catalyst mailing list
Catalyst at lists.rawmode.org
http://lists.rawmode.org/mailman/listinfo/catalyst
_______________________________________________
Catalyst mailing list
Catalyst at lists.rawmode.org
http://lists.rawmode.org/mailman/listinfo/catalyst -
Joe Landman at May 8, 2006 at 12:55 pm ⇧
Hmmm... with dns proxies like dnsmasq and friends, this should not be anGert Burger wrote:
Thanks for the reply, here are some of my comments on this:
Using round robin dns still means that if 50% of the servers are down,
50% of all queries will goto the broken machines. Which will piss of
half your customers.
issue.I have looked at the High Availability systems that have been writtenDesigns with no single points of failure. Whether they are highly
for linux and they provide doubles(Or more) of everything, from load
balancers to db servers. The issue I have with them are they require a
great deal of money in hardware to get running.
If you want highly available systems, this will cost you.
Anycase, back to my issue, How do websites like slashdot and amazon, all
which use perl, keep uptimes of close to 99.999% ?
available may be open to interpretation, but if you are going to stand
up a resource for use where the cost of being down (either economic or
equivalent cost) or the risk of unavailability is high, you are going to
want to make sure you have no single points of failure anywhere in your
process.And is it possible to get to that level with lots of crappy hardware?Heh. No.
Crappy hardware is as its name implies.
If you want highly reliable stuff, you are going to need to purchase
non-crappy hardware. This doesn't mean expensive hardware, just don't
buy the obvious crap. Lots of hardware out there is crappy. Dealing
with such hardware is a nightmare. Would cost you less to throw it away
in many cases and start with non-crappy hardware.
You need to design with the thought that single or multiple failures
will not take down everything. Also, you need to design for active
monitoring, simple start/stop mechanisms, and related.
A nice DB system is indicated, mysql/postgresql should be fine. We use
SQLite3 for some of our stuff and shuttle the DB around, as it is small
enough for us to do this with.
JoeCheers--
PS. Excuse me for meddling with the semi-impossible.On Mon, 2006-05-08 at 13:30 +0100, Peter Edwards wrote:
(I've put some more linebreaks in this time)
Set up the DNS for your application to map to multiple IP addresses, one
each for however many web server machines you need. Run your perl apps on
those.
Have a single database server machine with RAID mirrored disks. Have your
perl apps connect to that.
Regularly backup your database across the net to a disaster recovery (DR)
machine at a different physical location. With mysql you can do a hotcopy
then rsync the files across. Set up the DR server so it can also be a web
server.
Failures:
Disk - switch to mirror until you can replace the disk
Database host - switch your web apps to the DR server for database access;
have an application strategy on what to do with delayed transactions that
happened since the last database synchronisation [1]
Network/Datacentre - point DNS to DR server and use its web server (poor
performance, but at least limited access is available)
Assuming you've got your servers in a data centre with triple connections to
the Internet backbone, this last scenario is very unlikely.
A lot depends on how many users, how critical up-time is, what the cost
equation is between having an alternative site and hardware versus the
opportunity cost of lost sales and damaged reputation. The above works well
for 10-150 concurrent users. For more you could consider using the
clustering and failover features that come with some databases.
[1] For example, if you manage to recover the transaction log from the main
db server you can merge the records in later provided your app hasn't
allocated overlapping unique ids to its record keys.
Regards, Peter
-----Original Message-----
From: catalyst-bounces at lists.rawmode.org
[mailto:catalyst-bounces at lists.rawmode.org] On Behalf Of Gert Burger
Sent: 08 May 2006 12:41
To: The elegant MVC web framework
Subject: [Catalyst] Clustering catalyst apps
Hi
I was wondering a few days ago how one would create a cluster of
catalyst webapps?
Some of my early thoughts including just having multiple machines
running apache with a load balancer.
But you then still have a single point of failure, at the load balancer.
Another problem is, if you use some sort of database to store your
sessions etc then you have another point of failure.
Therefore, how can a average small company improve their
(Catalyst)webapps reliability without breaking the budget?
Gert Burger
_______________________________________________
Catalyst mailing list
Catalyst at lists.rawmode.org
http://lists.rawmode.org/mailman/listinfo/catalyst
_______________________________________________
Catalyst mailing list
Catalyst at lists.rawmode.org
http://lists.rawmode.org/mailman/listinfo/catalyst
_______________________________________________
Catalyst mailing list
Catalyst at lists.rawmode.org
http://lists.rawmode.org/mailman/listinfo/catalyst
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615 -
Aristotle Pagaltzis at May 8, 2006 at 12:58 pm ⇧
What does the fact that they use Perl have to do with their load* Gert Burger [2006-05-08 14:55]:
Anycase, back to my issue, How do websites like slashdot and
amazon, all which use perl, keep uptimes of close to 99.999% ?
balancing? It?s a red herring, isn?t it?And is it possible to get to that level with lots of crappy hardware?Might http://www.danga.com/perlbal/ help?
Regards,--
Aristotle Pagaltzis // <http://plasmasturm.org/> -
Len Jaffe at May 8, 2006 at 12:59 pm ⇧
They use redundant hardware laod balancers to front--- Gert Burger wrote:
Thanks for the reply, here are some of my comments
on this:
I have looked at the High Availability systems that
have been written
for linux and they provide doubles(Or more) of
everything, from load
balancers to db servers. The issue I have with them
are they require a
great deal of money in hardware to get running.
Anycase, back to my issue, How do websites like
slashdot and amazon, all
which use perl, keep uptimes of close to 99.999% ?
end everything.
What you spend to maintain uptime will depend on the
cost of downtime?
Len.
-
Perrin Harkins at May 8, 2006 at 3:09 pm ⇧
I don't think Slashdot can be considered highly available, but that'sOn Mon, 2006-05-08 at 14:45 +0200, Gert Burger wrote:
I have looked at the High Availability systems that have been written
for linux and they provide doubles(Or more) of everything, from load
balancers to db servers. The issue I have with them are they require a
great deal of money in hardware to get running.
You can't get high-availability for nothing.
Anycase, back to my issue, How do websites like slashdot and amazon, all
which use perl, keep uptimes of close to 99.999% ?
And is it possible to get to that level with lots of crappy hardware?
beside the point. Yahoo, Google, etc. all get high-availability on
mostly cheap hardware, but they have the scale to buy lots of it and put
a lot of effort into making it work. If your budget is relatively low,
you will probably get more reliability by spending more on your key
components (the database server, the load balancer), since you won't
want to pay for lots of redundant hardware. In other words, if you
aren't willing to buy doubles of everything, buy better hardware so it
is less likely to fail.
You can't expect miracles though -- real high availability is achieved
by having redundant hardware, hiring skilled personnel, and repeatedly
testing your failover plan. That's how the companies you mentioned do
it.
Slashdot doesn't need real high-availability so they have adopted a
strategy that might be more applicable to you, i.e. better hardware but
less of it. It's described here:
http://slashdot.org/faq/tech.shtml#te050
- Perrin
-
Peter Edwards at May 8, 2006 at 3:30 pm ⇧
Hi Gert, I think the key here is "average small company".
My gut feel is that two relatively cheap rented servers at different
datacentres using an (admittedly) crude DNS approach is enough to run most
small companies' web services reliably and cheaply.
I'm not sure if you need more. The sort of questions I'd ask about your
customer are:
How much money are they willing to spend per month?
Is it online retail or chat-based or some other service?
How many transactions per hour are they handling?
Is timeliness critical?
Do they handle a few big customers who need perfect service, or many smaller
customers where they can afford to lose a few due to downtime?
Assuming you do need more, there are a couple of aspects
1) Scalability
2) Reliability
Scalability.
The model I suggested lets you scale up to about 150 concurrent users. Most
small companies would be delighted to have that many :)
Reliability.
High availability doesn't necessarily mean that you have to have 100% of
your system's functionality on-line immediately. For example, if you still
have the web pages coming up, can see contact names and phone numbers, but
maybe it takes 30 minutes for the latest orders to reappear (via your
recovery process) then for many users that is going to be acceptable and not
considered an outage in service.
If you run a watchdog process to flip the DNS on failure and initiate a DR
process (or use something like perlbal that Aristotle suggested) you can
have a high *perceived* uptime approaching what your customer asked for.
Write the SLA in your support contract carefully and that might be enough.
To achieve a real 99.99% uptime with 100% functionality is going to cost you
a lot more... design time, testing, hardware, network, database licences,
monitoring, support staff.
Put it another way, are you selling them a "Rolls Royce" solution or a
diesel van? I know which most small businesses are going to go for. Of
course they'd love a Rolls Royce - as long as you pay for it - but all their
similar-sized competitors are driving diesel vans.
I'm not suggesting you skimp on remote monitoring, or the use of a TPM if
you really need it, just that a combination of KISS and customer expectation
management will save you money and trouble.
Regards, Peter
www.dragonstaff.com
-----Original Message-----
From: catalyst-bounces at lists.rawmode.org
[mailto:catalyst-bounces at lists.rawmode.org] On Behalf Of Gert Burger
Sent: 08 May 2006 13:45
To: The elegant MVC web framework
Subject: Re: [Catalyst] FW: Clustering catalyst apps
Thanks for the reply, here are some of my comments on this:
Using round robin dns still means that if 50% of the servers are down,
50% of all queries will goto the broken machines. Which will piss of
half your customers.
I have looked at the High Availability systems that have been written
for linux and they provide doubles(Or more) of everything, from load
balancers to db servers. The issue I have with them are they require a
great deal of money in hardware to get running.
Anycase, back to my issue, How do websites like slashdot and amazon, all
which use perl, keep uptimes of close to 99.999% ?
And is it possible to get to that level with lots of crappy hardware?
Cheers
PS. Excuse me for meddling with the semi-impossible.On Mon, 2006-05-08 at 13:30 +0100, Peter Edwards wrote:
(I've put some more linebreaks in this time)
Set up the DNS for your application to map to multiple IP addresses, one
each for however many web server machines you need. Run your perl apps on
those.
Have a single database server machine with RAID mirrored disks. Have your
perl apps connect to that.
Regularly backup your database across the net to a disaster recovery (DR)
machine at a different physical location. With mysql you can do a hotcopy
then rsync the files across. Set up the DR server so it can also be a web
server.
Failures:
Disk - switch to mirror until you can replace the disk
Database host - switch your web apps to the DR server for database access;
have an application strategy on what to do with delayed transactions that
happened since the last database synchronisation [1]
Network/Datacentre - point DNS to DR server and use its web server (poor
performance, but at least limited access is available)
Assuming you've got your servers in a data centre with triple connections to
the Internet backbone, this last scenario is very unlikely.
A lot depends on how many users, how critical up-time is, what the cost
equation is between having an alternative site and hardware versus the
opportunity cost of lost sales and damaged reputation. The above works well
for 10-150 concurrent users. For more you could consider using the
clustering and failover features that come with some databases.
[1] For example, if you manage to recover the transaction log from the main
db server you can merge the records in later provided your app hasn't
allocated overlapping unique ids to its record keys.
Regards, Peter
-----Original Message-----
From: catalyst-bounces at lists.rawmode.org
[mailto:catalyst-bounces at lists.rawmode.org] On Behalf Of Gert Burger
Sent: 08 May 2006 12:41
To: The elegant MVC web framework
Subject: [Catalyst] Clustering catalyst apps
Hi
I was wondering a few days ago how one would create a cluster of
catalyst webapps?
Some of my early thoughts including just having multiple machines
running apache with a load balancer.
But you then still have a single point of failure, at the load balancer.
Another problem is, if you use some sort of database to store your
sessions etc then you have another point of failure.
Therefore, how can a average small company improve their
(Catalyst)webapps reliability without breaking the budget?
Gert Burger
_______________________________________________
Catalyst mailing list
Catalyst at lists.rawmode.org
http://lists.rawmode.org/mailman/listinfo/catalyst
_______________________________________________
Catalyst mailing list
Catalyst at lists.rawmode.org
http://lists.rawmode.org/mailman/listinfo/catalyst
_______________________________________________
Catalyst mailing list
Catalyst at lists.rawmode.org
http://lists.rawmode.org/mailman/listinfo/catalyst
-
Wade Stuart at May 8, 2006 at 6:06 pm ⇧
catalyst-bounces at lists.rawmode.org wrote on 05/08/2006 10:30:03 AM:
Hi Gert, I think the key here is "average small company".
My gut feel is that two relatively cheap rented servers at different
datacentres using an (admittedly) crude DNS approach is enough to run most
small companies' web services reliably and cheaply.
I'm not sure if you need more. The sort of questions I'd ask about your
customer are:
How much money are they willing to spend per month?
Is it online retail or chat-based or some other service?
How many transactions per hour are they handling?
Is timeliness critical?
Do they handle a few big customers who need perfect service, or many smaller
customers where they can afford to lose a few due to downtime?
Assuming you do need more, there are a couple of aspects
1) Scalability
2) Reliability
Another thing you can do, and I have actually just seen this done first
hand (sad)
is figure out what the lowest percentage uptime that customers will not
baulk at for an SLA.
Lets say you choose 94% uptime.
Now, figure in how many hours your environment will be down per week.
68 hours?
ok, that puts you way below 94% uptime. So now start removing hours from
a 24x7 week.
Start with the weekends -- they are expendable (who uses applications on
the weekend) -- this gets you "free" 48 hours of downtime per week.
94% on 24x5, But you are still way below your 94% uptime SLA number. Just
calculate how many hours you need to lose from 24x5 to get to 94%. In this
case about two per day.
So put 94% uptime based on 22x5 in your SLA.
Sounds good on paper, when you do the math looks like 61.3% uptime though
on a 24x7 schedule.
Of course I am being facetious, I have just had these exact numbers given
to me for one of our vendors services -- gogo outsourcing.
Wade
-
Matt S Trout at May 8, 2006 at 6:16 pm ⇧
Not really, if you're buying a fair few at once people like Dell will doGert Burger wrote:
Thanks for the reply, here are some of my comments on this:
Using round robin dns still means that if 50% of the servers are down,
50% of all queries will goto the broken machines. Which will piss of
half your customers.
I have looked at the High Availability systems that have been written
for linux and they provide doubles(Or more) of everything, from load
balancers to db servers. The issue I have with them are they require a
great deal of money in hardware to get running.
you pretty damn good deals on moderate 1u servers if you're buying a lot
of them at once.
In the end though, you're probably going to have a single point of
failure somewhere - e.g. "we have two of every piece of kit but only one
backup generator for when the UPSen run out", or at the very least
"there's only one planet earth so if there's an extinction level
asteroid strike ..." :)
The point here is to push that single point of failure back as far as is
cost-effective. For a small company, making sure your border router is
bloody good and using two servers with a single service IP and failover
is often good enough.
-
Dave C at May 8, 2006 at 8:14 pm ⇧
Disclaimer: I work for a large hosting company (shameless:On 5/8/06, Gert Burger wrote:
Thanks for the reply, here are some of my comments on this:
http://www.hostway.com) and I specialize in designing highly available
clusters for large customers using all Open Source, freely available
software running on both (depending on the customer) "crappy" and
non-crappy systems (we host parts of foxnews.com, orbitz, Wikipedia,
and others).
The key to offer the "five nines" availabilty (99.999%, or under 5
minutes a year) is to examine faults in every aspect, including
application, hardware, network, facility, and OS to identify single
points of failure. Then, just design around them. Even down to such
details as plugging servers into different power strips on separate
phases (may seem obvious, but you'd be suprised what I've seen bring a
cluster down), and using IP addresses located on different subnets,
etc.
On a larger scale, we happen to offer a global caching platform
similar to Akamai built on pure Open Source software which will route
around an entire data center going offline (we have ten different data
centers).Using round robin dns still means that if 50% of the servers are down,Not necessarily. Both google.com and yahoo.com use RR DNS:
50% of all queries will goto the broken machines. Which will piss of
half your customers.
host www.google.com
www.google.com is an alias for www.l.google.com.
www.l.google.com has address 64.233.161.99
www.l.google.com has address 64.233.161.104
www.l.google.com has address 64.233.161.14
host www.yahoo.com
www.yahoo.com is an alias for www.yahoo.akadns.net.
www.yahoo.akadns.net has address 68.142.226.41
www.yahoo.akadns.net has address 68.142.226.32
www.yahoo.akadns.net has address 68.142.226.38
www.yahoo.akadns.net has address 68.142.226.52
www.yahoo.akadns.net has address 68.142.226.34
www.yahoo.akadns.net has address 68.142.226.37
www.yahoo.akadns.net has address 68.142.226.53
www.yahoo.akadns.net has address 68.142.226.55
However, they lower the TTL on the records to under 60 seconds, which
allows for changes to be made quickly. Using monitoring software
like nagios, monit, or your own using Test::WWW::Mechanize::Catalyst,
one could connect to the application on each alias and if there is an
error, yank that IP from DNS.Anycase, back to my issue, How do websites like slashdot and amazon, allThey use multiple layers of redundancy. As I outlined above, the
which use perl, keep uptimes of close to 99.999% ?
first point would be RR DNS, then, each of the IPs returned are
connected to some sort of load balancer (hardware possibly using
BigIP, Foundry, or Cisco gear, software using LVS). There's some
reverse proxying being done, connecting to query caches for database
intensive work, then returning the request back to the client.
For a good outline of how LiveJournal uses open source software for
high availablity, check
http://www.danga.com/words/2004_oscon/oscon2004.pdfAnd is it possible to get to that level with lots of crappy hardware?Yes, Google actually designs around this. They don't even use
hardware RAID in their systems and are said to use commodity equipment
costing roughly $1000/piece.
http://www.internetnews.com/xSP/article.php/3487041
dave.
-
Johan Lindström at May 8, 2006 at 9:12 pm ⇧
If you go that far, don't forget to make sure your two independent ISPsAt 22:14 2006-05-08, Dave C wrote:
The key to offer the "five nines" availabilty (99.999%, or under 5
minutes a year) is to examine faults in every aspect, including
application, hardware, network, facility, and OS to identify single
points of failure.
really are independent and don't buy their upstream bandwidth from the same
provider :)
That happened to us a couple of years ago; the upstream provider had some
downtime and we were mightily upset.
/J
-
Matt S Trout at May 8, 2006 at 9:46 pm ⇧
At $ork[-mumble], we had two links, both physically entirely separateJohan Lindstr?m wrote:At 22:14 2006-05-08, Dave C wrote:If you go that far, don't forget to make sure your two independent ISPs
The key to offer the "five nines" availabilty (99.999%, or under 5
minutes a year) is to examine faults in every aspect, including
application, hardware, network, facility, and OS to identify single
points of failure.
really are independent and don't buy their upstream bandwidth from the same
provider :)
That happened to us a couple of years ago; the upstream provider had some
downtime and we were mightily upset.
(our BNetworkAdminFH had ensured that they even went out different sides
of the building). Unfortunately, one time some bastards half-filled a
bunch of wheely-bins full of petrol, lit it, waited a few seconds, then
emptied the bins down carefully-chosen manholes over comms line
intersections, reducing fibre-optic bundles to slag quickly and
effectively. They got about 80% of the major intersections in the area,
naturally including both our lines.
Sometimes even not having a single point of failure won't save you.
-
Jules Agee at May 9, 2006 at 6:31 pm ⇧
Anyone used pound <http://www.apsis.ch/pound/>? Looks like a prettyDave C wrote:
On a larger scale, we happen to offer a global caching platform
similar to Akamai built on pure Open Source software which will route
around an entire data center going offline (we have ten different data
centers).
interesting solution for inexpensive http reverse-proxy, failover, load
balancing, ssl wrapper, etc.
--
Jules Agee
System Administrator
Pacific Coast Feather Co.
julesa at pcf.com x284 -
Matt S Trout at May 9, 2006 at 6:44 pm ⇧
Looks interesting, although I think for serious scaling I'd probablyJules Agee wrote:
Dave C wrote:On a larger scale, we happen to offer a global caching platformAnyone used pound <http://www.apsis.ch/pound/>? Looks like a pretty
similar to Akamai built on pure Open Source software which will route
around an entire data center going offline (we have ten different data
centers).
interesting solution for inexpensive http reverse-proxy, failover, load
balancing, ssl wrapper, etc.
prefer perlbal due to its being somewhat smarter (and easier to extend).
Probably worth trialling and benchmarking both, mind.
-
Michael Alan Dorman at May 10, 2006 at 2:12 pm ⇧
I've used pound on a moderate-traffic (2M hits/day) site for about theJules Agee <julesa at pcf.com> writes:
Anyone used pound <http://www.apsis.ch/pound/>? Looks like a pretty
interesting solution for inexpensive http reverse-proxy, failover, load
balancing, ssl wrapper, etc.
last three years. It is capable, simple to setup and reliable---I had
to login to the proxy server to see when it had last been restarted
(Jan 30, incidentally).
I do wish it wasn't so ardent about spamming the logs about every
dropped connection---on a site with any traffic, this happens a lot,
so it's annoying.
Mike--
Give me a Leonard Cohen afterworld -
Matt S Trout at May 10, 2006 at 3:00 pm ⇧
So patch it :D, it's open source after all.Michael Alan Dorman wrote:
Jules Agee <julesa at pcf.com> writes:Anyone used pound <http://www.apsis.ch/pound/>? Looks like a prettyI've used pound on a moderate-traffic (2M hits/day) site for about the
interesting solution for inexpensive http reverse-proxy, failover, load
balancing, ssl wrapper, etc.
last three years. It is capable, simple to setup and reliable---I had
to login to the proxy server to see when it had last been restarted
(Jan 30, incidentally).
I do wish it wasn't so ardent about spamming the logs about every
dropped connection---on a site with any traffic, this happens a lot,
so it's annoying.
-
Len Jaffe at May 10, 2006 at 3:09 pm ⇧
Don't you know? You aren't really supposed to patch--- Matt S Trout wrote:
Michael Alan Dorman wrote:I do wish it wasn't so ardent about spamming theSo patch it :D, it's open source after all.
logs about every
dropped connection---on a site with any traffic,
this happens a lot,
so it's annoying.
open source code. Only free software. If you patch
open source code, the maintainers roll their eyes at
you and sigh or mutter.
-
Matt S Trout at May 10, 2006 at 3:28 pm ⇧
I thought that was the unquiet dead that did that.Len Jaffe wrote:
Don't you know? You aren't really supposed to patch
open source code. Only free software. If you patch
open source code, the maintainers roll their eyes at
you and sigh or mutter.
Anyway, it's GPL, so it's free software too. *disappears in a puff of logic*
-
Dave Hodgkinson at May 10, 2006 at 3:35 pm ⇧
Given that it's *years* since I did a diff in anger, a quick recipeOn 10 May 2006, at 16:28, Matt S Trout wrote:
Len Jaffe wrote:Don't you know? You aren't really supposed to patchI thought that was the unquiet dead that did that.
open source code. Only free software. If you patch
open source code, the maintainers roll their eyes at
you and sigh or mutter.
Anyway, it's GPL, so it's free software too. *disappears in a puff
of logic*
on how to submit patches would be welcome...
-
Matt S Trout at May 10, 2006 at 5:11 pm ⇧
If they have an svn repo I usually check it out, edit in place and sendDave Hodgkinson wrote:On 10 May 2006, at 16:28, Matt S Trout wrote:Given that it's *years* since I did a diff in anger, a quick recipe
Len Jaffe wrote:Don't you know? You aren't really supposed to patchI thought that was the unquiet dead that did that.
open source code. Only free software. If you patch
open source code, the maintainers roll their eyes at
you and sigh or mutter.
Anyway, it's GPL, so it's free software too. *disappears in a puff
of logic*
on how to submit patches would be welcome...
an svk diff
If not, unpacking the tar, cp -pR ing it to -orig, and doing a diff -ur
across the two dirs when done seems to be ok. However, a lot of authors
are *incredibly* picky about what diff options you use, so I find you
usually have to re-send at least once with some option you've never
heard of added to the list :)
-
Len Jaffe at May 10, 2006 at 5:34 pm ⇧
See also: hand-wringing, gesticulating, speaking in tongues.--- Matt S Trout wrote:
However, a lot of authors
are *incredibly* picky about what diff options you
use, so I find you
usually have to re-send at least once with some
option you've never
heard of added to the list :)
Leonard A. Jaffe lenjaffe at jaffesystems.com
Leonard Jaffe Computer Systems Consulting Ltd.
Columbus, OH, USA 614-404-4214 F: 530-380-7423
-
Matt S Trout at May 10, 2006 at 5:39 pm ⇧
And on one occasion the stunningly unexpected response of "it doesn'tLen Jaffe wrote:
--- Matt S Trout wrote:However, a lot of authorsSee also: hand-wringing, gesticulating, speaking in tongues.
are *incredibly* picky about what diff options you
use, so I find you
usually have to re-send at least once with some
option you've never
heard of added to the list :)
matter really, since I much prefer to apply changes by hand".
-
Wade Stuart at May 10, 2006 at 5:53 pm ⇧
That would be a side effect of many, many rounds of the same patches to theLen Jaffe wrote:And on one occasion the stunningly unexpected response of "it doesn't
--- Matt S Trout wrote:However, a lot of authorsSee also: hand-wringing, gesticulating, speaking in tongues.
are *incredibly* picky about what diff options you
use, so I find you
usually have to re-send at least once with some
option you've never
heard of added to the list :)
matter really, since I much prefer to apply changes by hand".
list with different non-requested flags.
I have seen more than one maintainer throw his arms up in frustration and
just take the whole file or tar to generate his own diff/patch/hand
changes.
-Wade
-
Dave Hodgkinson at May 10, 2006 at 7:17 pm ⇧
And some that they've made up...On 10 May 2006, at 18:11, Matt S Trout wrote:
Dave Hodgkinson wrote:If they have an svn repo I usually check it out, edit in place andOn 10 May 2006, at 16:28, Matt S Trout wrote:Given that it's *years* since I did a diff in anger, a quick recipe
Len Jaffe wrote:Don't you know? You aren't really supposed to patchI thought that was the unquiet dead that did that.
open source code. Only free software. If you patch
open source code, the maintainers roll their eyes at
you and sigh or mutter.
Anyway, it's GPL, so it's free software too. *disappears in a puff
of logic*
on how to submit patches would be welcome...
send
an svk diff
Now that's good to know.
If not, unpacking the tar, cp -pR ing it to -orig, and doing a diff
-ur
across the two dirs when done seems to be ok. However, a lot of
authors
are *incredibly* picky about what diff options you use, so I find you
usually have to re-send at least once with some option you've never
heard of added to the list :)
-
Fernan Aguero at May 10, 2006 at 5:19 pm ⇧
+----[ Dave Hodgkinson (10.May.2006 12:41):Given that it's *years* since I did a diff in anger, a quick recipe+----]
on how to submit patches would be welcome...
cp file file.orig
[edit file at leisure and save your changes]
diff -u file.orig file > file.diff
and send your patch (file.diff) as an attachment
the usual recommendations apply: do not edit
whitespace/tabs and minor (ie not important stuff). This
will make your diff cleaner and thus easier to read.
Fernan
-
Matt S Trout at May 10, 2006 at 5:38 pm ⇧
I'd really recommend doing a recursive diff on the entire try. makesFernan Aguero wrote:
+----[ Dave Hodgkinson (10.May.2006 12:41):Given that it's *years* since I did a diff in anger, a quick recipe+----]
on how to submit patches would be welcome...
cp file file.orig
[edit file at leisure and save your changes]
diff -u file.orig file > file.diff
and send your patch (file.diff) as an attachment
the usual recommendations apply: do not edit
whitespace/tabs and minor (ie not important stuff). This
will make your diff cleaner and thus easier to read.
things much easier for maints since they can just cd to the root of
their checkout and do patch -p0 <rdiff.file
-
Roy-Magne Mo at May 8, 2006 at 1:31 pm ⇧
I do not agree with this, DNS load balancing is crude and leaves a lotm? den 08.05.2006 klokka 13:30 (+0100) skreiv Peter Edwards:
(I've put some more linebreaks in this time)
Set up the DNS for your application to map to multiple IP addresses, one
each for however many web server machines you need. Run your perl apps on
those.
up to implementation of the client.
It all depends on the budget, but setting up two obsolete/cheap servers
as a LVS (Linux Virtual Server) in front of you real servers, will bring
you quite close to what you want. Design for failure of one or more
nodes.
You will of course also need to think about the network design and how
much you are wanting to put in. A nice setup with quagga and BGP on the
LVS nodes could possibly work well :)
Test, test, test and test this setup before putting it into production.
If you are running SSL, you also might wan't to offload the SSL to
separate servers.
If you are using MySQL, look into the new clustering options.
Running a high volume site is always a continious process, this is
probably just step 1.
--
Roy-Magne Mo <rmo at sunnmore.net>
Related Discussions
Discussion Navigation
| view | thread | post |
Discussion Overview
| group | catalyst |
| categories | catalyst, perl |
| posted | May 8, '06 at 11:41a |
| active | May 10, '06 at 7:17p |
| posts | 31 |
| users | 16 |
| website | catalystframework.org |
| irc | #catalyst |
