FAQ
I'm trying to setup what I think would be a common hadoop
configuration. I have 4 data nodes on an internal 10.x network. Each
of the data nodes only has access to the 10.x network. The name node
has both an internal 10.x network interface and an external interface.
I want the hdfs filesystem and job tracker to be available on the
external network, but the communication within the cluster to be on the
10.x network. Is this possible to do? Changing the fs.default.name
configuration parameter I can change the filesystem to listen from the
internal to the external interface, however, then the data nodes can't
communicate to the name node. I also tried setting the fs.default.name
IP address to 0.0.0.0 to see if it would bind to all interfaces, but
that didn't seem to work.



Is it possible to configure hadoop so that the datanodes communicate
on an internal network, but access to hdfs and the job tracker are done
through an external interface?



Any help would be much appreciated.



Thank you



Andy

Search Discussions

  • Taeho Kang at Dec 9, 2008 at 1:10 am
    When reading from or writing to a file on HDFS, datablocks never go thru the
    namenode. They are directly handled/transferred between your client and the
    datanodes that contain the blocks.

    Hence, datanodes must be accessible by your client. In this case since your
    client is on an external network, your datanodes must be accessible to
    external networks.

    On Tue, Dec 9, 2008 at 8:25 AM, Andy Sautins wrote:



    I'm trying to setup what I think would be a common hadoop
    configuration. I have 4 data nodes on an internal 10.x network. Each
    of the data nodes only has access to the 10.x network. The name node
    has both an internal 10.x network interface and an external interface.
    I want the hdfs filesystem and job tracker to be available on the
    external network, but the communication within the cluster to be on the
    10.x network. Is this possible to do? Changing the fs.default.name
    configuration parameter I can change the filesystem to listen from the
    internal to the external interface, however, then the data nodes can't
    communicate to the name node. I also tried setting the fs.default.name
    IP address to 0.0.0.0 to see if it would bind to all interfaces, but
    that didn't seem to work.



    Is it possible to configure hadoop so that the datanodes communicate
    on an internal network, but access to hdfs and the job tracker are done
    through an external interface?



    Any help would be much appreciated.



    Thank you



    Andy
  • Andy Sautins at Dec 9, 2008 at 1:51 am
    Ah. Thanks. That makes what I was trying to do sound rather
    ridiculous now, does it.

    I appreciate the insight.

    Thanks

    Andy

    -----Original Message-----
    From: Taeho Kang
    Sent: Monday, December 08, 2008 6:10 PM
    To: core-user@hadoop.apache.org
    Subject: Re: internal/external interfaces for hadoop...

    When reading from or writing to a file on HDFS, datablocks never go thru
    the
    namenode. They are directly handled/transferred between your client and
    the
    datanodes that contain the blocks.

    Hence, datanodes must be accessible by your client. In this case since
    your
    client is on an external network, your datanodes must be accessible to
    external networks.


    On Tue, Dec 9, 2008 at 8:25 AM, Andy Sautins
    wrote:

    I'm trying to setup what I think would be a common hadoop
    configuration. I have 4 data nodes on an internal 10.x network. Each
    of the data nodes only has access to the 10.x network. The name node
    has both an internal 10.x network interface and an external interface.
    I want the hdfs filesystem and job tracker to be available on the
    external network, but the communication within the cluster to be on the
    10.x network. Is this possible to do? Changing the fs.default.name
    configuration parameter I can change the filesystem to listen from the
    internal to the external interface, however, then the data nodes can't
    communicate to the name node. I also tried setting the
    fs.default.name
    IP address to 0.0.0.0 to see if it would bind to all interfaces, but
    that didn't seem to work.



    Is it possible to configure hadoop so that the datanodes communicate
    on an internal network, but access to hdfs and the job tracker are done
    through an external interface?



    Any help would be much appreciated.



    Thank you



    Andy

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedDec 8, '08 at 11:26p
activeDec 9, '08 at 1:51a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Andy Sautins: 2 posts Taeho Kang: 1 post

People

Translate

site design / logo © 2022 Grokbase