FAQ

On Jun 7, 2011, at 12:07 AM, sanjeev.taran@us.pwc.com wrote:

Hello,

I wanted to know if anyone has any tips or tutorials on howto install the
hadoop cluster on multiple datacenters
Generally, this is a bad idea. Why?
1) Inter-datacenter bandwidth is expensive compared to cluster bandwidth.
2) This extra topological constraint is not currently well-modeled in the Hadoop architecture. This means that you will likely find assumptions in the software that are not true in the inter-datacenter case.
3) None of the biggest users currently do this. Until you plan on putting serious money into the game, follow what is well-established to work.

I would note that, in my other life, I work with a batch-oriented distributed computing system called Condor (http://www.cs.wisc.edu/condor/). Condor is designed to naturally span the globe (I've seen it spanning around 50 clusters). However, it is batch job oriented, not data oriented. If you have to wedge your problem to fit into the MapReduce paradigm, this might be a good alternate.
Do you need ssh connectivity between the nodes across these data centers?

Definitely not. SSH is only used in the wrapper scripts to start the HDFS daemons. It's a usability crutch for smaller clusters that don't have proper management.

If your ops folks don't have a better way to manage what is running on your cluster, fire them.

Brian

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 5 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 7, '11 at 5:08a
activeJun 7, '11 at 12:58p
posts5
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase