Hi Sam,
This implementation is intended to address another potential problem -
which is
that if the server is multi-homed and/or multi addressed, there's no way to
know
which addresses/hostnames are routable from the cluster nodes. To that end,
what
we do is look at the source IP address of the ssh connection we make to the
node,
and then, because we don't want to have to use an IP address, we do a
reverse
lookup to get a hostname.
Yes, with EC2, this doesn't work if the server is restarted and the IP and
DNS change,
but it's not clear what a clean automatic solution for EC2 looks like.
Asking the server what its IP and hostname is will just yield the same IP
and address
that the node discovered using the ssh/reverse-DNS method, so that doesn't
help.
You can assign an elastic IP to the instance, but even then, basic DNS
operations
will not lead you to this IP or the associated hostname. You need to do an
EC2
specific metadata query to discover the elastic IP/hostname, which we're not
really in a position to do - and then there's the race condition due to not
being
able to assign the elastic IP until after the instance is started, and so
on.
Ultimately, it would come down to requiring the user to manually specify
what
hostname the server should claim to have. At that point you could type in
the
elastic hostname and it would work, but we've been trying to avoid a manual
solution. This is something we can consider, although I can't promise
if/when
it would show up.
Have I missed something in how you're configuring your ec2 nodes that would
allow an automated mechanism to find the 'right' hostname?
--phil
On 28 August 2012 06:38, Sam Darwin wrote:
Hi,
Problem:
It looks like the agent config file at /etc/cloudera-scm-agent/config is
using the server host name that it finds via reverse DNS lookup or
something. Now, reverse DNS is a bit esoteric for the average user, and
it's often set incorrectly or it's set by the service provider to be
something strange, and it's out of control of the server admin doing the
project.
With EC2, the IP of the server might change, if that happens the
reverse-lookup DNS will change, and all the agents will be wrong now.
Solution:
If the agent is being installed from a central server, and the central
server knows it's own hostname, then this value should be populated from
the server's hostname, and not from a reverse lookup. A defined
hostname (myserver1) will stay constant , even in the face of IP address
changes. And usually the cloudera software GUI is showing
forward-lookup fqdn's.
I apologize in advance if I have misread this whole situation.
Best Regards,
Sam