On Fri, May 28, 2010 at 12:06 PM, Michael Segel wrote:
You can't do that.
Unfortunately, Mike is right.
The problem is that Hadoop is going to pick up your external IP address because that's what the machine name resolves to. Then your slave nodes are on the internal route and you don't see them.
Is it a bug? Maybe. More like a design defect.
Definitely in the design defect category. The host name handling /
binding code is... complicated and not ideal for these types of
The work around is to forget about using the second nic card for hadoop/hbase traffic. Or make the internal network match your machine name and its dns information. Then use the second ip address to communicate with the outside world.
So if your machine name is foo, and your dominiant ip address is on eth(0), you want foo.company.com to resolve to eth(0) and foo-ext.company.com
to resolve to eth(1). Its backwards but it should work.
IMHO, after looking at this issue, it really doesn't matter since the cloud shouldn't be getting a lot of external traffic except on the name node/job tracker nodes which could be multi-homed.
It might be useful in the case where you're streaming data off of HDFS
directly to clients rather than in the MR or HBase case. Data import /
export comes to mind. Remember that clients establish a direct
connection to data nodes so a multihomed NN is insufficient. In that
case, "external" doesn't necessarily mean a public (routable) IP, but
simply another network. We've seen use cases for this in some
installations. One example is a data aggregation or ingestion network
is separate from the Hadoop internal network and you'd like to get
data into HDFS.