FAQ
Hello,

I wanted to know if anyone has any tips or tutorials on howto install the
hadoop cluster on multiple datacenters

Do you need ssh connectivity between the nodes across these data centers?

Thanks in advance for any guidance you can provide.
________________________________________________________________________________________________

______________________________________________________________________
The information transmitted, including any attachments, is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited, and all liability arising therefrom is disclaimed. If you received this in error, please contact the sender and delete the material from any computer. PricewaterhouseCoopers LLP is a Delaware limited liability partnership. This communication may come from PricewaterhouseCoopers LLP or one of its subsidiaries.

Search Discussions

  • Shahnawaz Saifi at Jun 7, 2011 at 8:09 am
    Yes! ssh connectivity is required.
    On Tue, Jun 7, 2011 at 10:37 AM, wrote:

    Hello,

    I wanted to know if anyone has any tips or tutorials on howto install the
    hadoop cluster on multiple datacenters

    Do you need ssh connectivity between the nodes across these data centers?

    Thanks in advance for any guidance you can provide.

    ________________________________________________________________________________________________

    ______________________________________________________________________
    The information transmitted, including any attachments, is intended only
    for the person or entity to which it is addressed and may contain
    confidential and/or privileged material. Any review, retransmission,
    dissemination or other use of, or taking of any action in reliance upon,
    this information by persons or entities other than the intended recipient is
    prohibited, and all liability arising therefrom is disclaimed. If you
    received this in error, please contact the sender and delete the material
    from any computer. PricewaterhouseCoopers LLP is a Delaware limited
    liability partnership. This communication may come from
    PricewaterhouseCoopers LLP or one of its subsidiaries.


    --
    Thanks,
    Shah
  • Steve Loughran at Jun 7, 2011 at 9:53 am

    On 06/07/2011 06:07 AM, sanjeev.taran@us.pwc.com wrote:
    Hello,

    I wanted to know if anyone has any tips or tutorials on howto install the
    hadoop cluster on multiple datacenters
    Nobody has come out and said they've built a single HDFS filesystem from
    multiple sites, primarly because the inter-site bandwidth/latency will
    be awful and there isn't any support for this in the topology model of
    Hadoop (there are some placeholders though).

    You could set up an HDFS filesystem in each datacentre, and use symbolic
    links (or the forthcoming federation) to pull data in. There's no reason
    why you can't start up a job on Datacentre-1 that starts reading some of
    its data from DC-2, after which all the work will be datacentre-local.
    Do you need ssh connectivity between the nodes across these data centers?
    Depends on how you deploy Hadoop. You only need SSH if you use the
    built-in tooling; if you use large scale cluster management tools then
    it's a non-issue.
  • Michael Segel at Jun 7, 2011 at 11:11 am
    PWC now getting in to Hadoop? Interesting....

    Sanjeev, the simple short answer is that you don't create a cloud that spans a data center. Bad design.
    You build two clusters one per data center.


    To: common-user@hadoop.apache.org
    Subject: Hadoop Cluster Multi-datacenter
    From: sanjeev.taran@us.pwc.com
    Date: Mon, 6 Jun 2011 22:07:51 -0700

    Hello,

    I wanted to know if anyone has any tips or tutorials on howto install the
    hadoop cluster on multiple datacenters

    Do you need ssh connectivity between the nodes across these data centers?

    Thanks in advance for any guidance you can provide.
    ________________________________________________________________________________________________

    ______________________________________________________________________
    The information transmitted, including any attachments, is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited, and all liability arising therefrom is disclaimed. If you received this in error, please contact the sender and delete the material from any computer. PricewaterhouseCoopers LLP is a Delaware limited liability partnership. This communication may come from PricewaterhouseCoopers LLP or one of its subsidiaries.
  • Brian Bockelman at Jun 7, 2011 at 12:58 pm

    On Jun 7, 2011, at 12:07 AM, sanjeev.taran@us.pwc.com wrote:

    Hello,

    I wanted to know if anyone has any tips or tutorials on howto install the
    hadoop cluster on multiple datacenters
    Generally, this is a bad idea. Why?
    1) Inter-datacenter bandwidth is expensive compared to cluster bandwidth.
    2) This extra topological constraint is not currently well-modeled in the Hadoop architecture. This means that you will likely find assumptions in the software that are not true in the inter-datacenter case.
    3) None of the biggest users currently do this. Until you plan on putting serious money into the game, follow what is well-established to work.

    I would note that, in my other life, I work with a batch-oriented distributed computing system called Condor (http://www.cs.wisc.edu/condor/). Condor is designed to naturally span the globe (I've seen it spanning around 50 clusters). However, it is batch job oriented, not data oriented. If you have to wedge your problem to fit into the MapReduce paradigm, this might be a good alternate.
    Do you need ssh connectivity between the nodes across these data centers?

    Definitely not. SSH is only used in the wrapper scripts to start the HDFS daemons. It's a usability crutch for smaller clusters that don't have proper management.

    If your ops folks don't have a better way to manage what is running on your cluster, fire them.

    Brian

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 7, '11 at 5:08a
activeJun 7, '11 at 12:58p
posts5
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase