FAQ
Hello all.

Due to limited space in current datacenter, I am trying to move my Hadoop
cluster to a new datacenter.
In the new datacenter, each machine will keep its hostname, but each will be
assigned to a new ip address.
We should be able to edit our DNS to assign existing hostnames to new ip
addresses.

My understanding is that namenode keeps track of a datanode with an ip
address, not a hostname.
(According to the description found on for "private String hostName" in
DatanodeInfo.java class)

Thus, the datanode / block info will be obsolete if the Hadoop cluster is
moved unless we do modify all the datanode / block info in the namenode....

The solutions that I can think of right now is...
1. modify all the datanode / block info : quite a risky work, i guess.
2. Have some "buffer" servers and have the data moved to there, and then to
the moved cluster in the new datacenter : but then it would require some
machines with lots of free storage and very careful planning.

Any comments on my solutions or any other suggestions will be welcomed!
Thank you all in advance.

Regards,

Taeho

p.s. Any future plan to hava the datanode info with a hostname instead of an
ip address?
Also, what was the motivation behind using an ipaddress instead of a
hostname to identify datanodes?

--
Taeho Kang [tkang.blogspot.com]
Software Engineer, NHN Corporation, Korea

Search Discussions

  • Raghu Angadi at Oct 4, 2007 at 3:51 am

    Taeho Kang wrote:
    Hello all.

    Due to limited space in current datacenter, I am trying to move my Hadoop
    cluster to a new datacenter.
    In the new datacenter, each machine will keep its hostname, but each will be
    assigned to a new ip address.
    We should be able to edit our DNS to assign existing hostnames to new ip
    addresses.

    My understanding is that namenode keeps track of a datanode with an ip
    address, not a hostname.
    (According to the description found on for "private String hostName" in
    DatanodeInfo.java class)
    Namenode keeps track of these just as convenience for the admins. In
    fact in trunk it does not store any datanode info. So its ok even if
    both hostnames and ipaddress change. Only side effect is that if the
    ipaddresses change you will all the old ip listed under 'dead nodes' on
    webui (which will go away when you upgrade to 0.15.x).
    Thus, the datanode / block info will be obsolete if the Hadoop cluster is
    moved unless we do modify all the datanode / block info in the namenode....
    I don't think so.
    The solutions that I can think of right now is...
    Could you state the problem again? Essentially you have to move the data
    to new datacenter.

    Raghu.
    1. modify all the datanode / block info : quite a risky work, i guess.
    2. Have some "buffer" servers and have the data moved to there, and then to
    the moved cluster in the new datacenter : but then it would require some
    machines with lots of free storage and very careful planning.

    Any comments on my solutions or any other suggestions will be welcomed!
    Thank you all in advance.

    Regards,

    Taeho

    p.s. Any future plan to hava the datanode info with a hostname instead of an
    ip address?
    Also, what was the motivation behind using an ipaddress instead of a
    hostname to identify datanodes?
  • Taeho Kang at Oct 4, 2007 at 5:26 am
    Thanks for your quick reply, Raghu.

    The problem I am faced with is...
    - I need to move my machines to a new location
    - The new location will assign new ip addresses for my machines.
    I am worried that this change of ip addresses may create havoc in the file
    system (i.e. discrepancy in file block info) once the cluster starts up in
    the new location.

    Is it going to be a problem?


    On 10/4/07, Raghu Angadi wrote:

    Taeho Kang wrote:
    Hello all.

    Due to limited space in current datacenter, I am trying to move my Hadoop
    cluster to a new datacenter.
    In the new datacenter, each machine will keep its hostname, but each will be
    assigned to a new ip address.
    We should be able to edit our DNS to assign existing hostnames to new ip
    addresses.

    My understanding is that namenode keeps track of a datanode with an ip
    address, not a hostname.
    (According to the description found on for "private String hostName" in
    DatanodeInfo.java class)
    Namenode keeps track of these just as convenience for the admins. In
    fact in trunk it does not store any datanode info. So its ok even if
    both hostnames and ipaddress change. Only side effect is that if the
    ipaddresses change you will all the old ip listed under 'dead nodes' on
    webui (which will go away when you upgrade to 0.15.x).
    Thus, the datanode / block info will be obsolete if the Hadoop cluster is
    moved unless we do modify all the datanode / block info in the
    namenode....

    I don't think so.
    The solutions that I can think of right now is...
    Could you state the problem again? Essentially you have to move the data
    to new datacenter.

    Raghu.
    1. modify all the datanode / block info : quite a risky work, i guess.
    2. Have some "buffer" servers and have the data moved to there, and then to
    the moved cluster in the new datacenter : but then it would require some
    machines with lots of free storage and very careful planning.

    Any comments on my solutions or any other suggestions will be welcomed!
    Thank you all in advance.

    Regards,

    Taeho

    p.s. Any future plan to hava the datanode info with a hostname instead of an
    ip address?
    Also, what was the motivation behind using an ipaddress instead of a
    hostname to identify datanodes?

    --
    Taeho Kang [tkang.blogspot.com]
    Software Engineer, NHN Corporation, Korea
  • Raghu Angadi at Oct 4, 2007 at 6:15 am

    Taeho Kang wrote:
    Thanks for your quick reply, Raghu.

    The problem I am faced with is...
    - I need to move my machines to a new location
    assuming this goes well (i.e. no data loss),
    - The new location will assign new ip addresses for my machines.
    I am worried that this change of ip addresses may create havoc in the file
    system (i.e. discrepancy in file block info) once the cluster starts up in
    the new location.

    Is it going to be a problem?
    nope. change in ip is not going to be a problem.

    Raghu.
  • Raghu Angadi at Oct 4, 2007 at 4:48 pm
    (I am not sure if I replied already...)

    Taeho Kang wrote:
    Thanks for your quick reply, Raghu.

    The problem I am faced with is...
    - I need to move my machines to a new location
    Assuming this goes well (i.e. no data loss),
    - The new location will assign new ip addresses for my machines.
    I am worried that this change of ip addresses may create havoc in the file
    system (i.e. discrepancy in file block info) once the cluster starts up in
    the new location.
    this not an issue.
    Is it going to be a problem?
    I don't think so.

    Raghu.
  • Taeho Kang at Oct 5, 2007 at 7:56 am
    Thanks again for your answer, Raghu.

    Now, just out of my curiosity....

    How are the exact locations of blocks/chunks being kept track of?
    Is there any other mechanism/information that the namenode uses, other than
    ip address or hostname, to manage the whereabouts of data blocks?

    On 10/5/07, Raghu Angadi wrote:


    (I am not sure if I replied already...)

    Taeho Kang wrote:
    Thanks for your quick reply, Raghu.

    The problem I am faced with is...
    - I need to move my machines to a new location
    Assuming this goes well (i.e. no data loss),
    - The new location will assign new ip addresses for my machines.
    I am worried that this change of ip addresses may create havoc in the file
    system ( i.e. discrepancy in file block info) once the cluster starts up in
    the new location.
    this not an issue.
    Is it going to be a problem?
    I don't think so.

    Raghu.


    --
    Taeho Kang [tkang.blogspot.com]
    Software Engineer, NHN Corporation, Korea
  • Hairong Kuang at Oct 5, 2007 at 4:33 pm
    Block locations are not persistent data. They are not kept any where after a
    namenode shuts down. Instead each datanode reports its blocks to the
    namenode at the startup time.

    Hairong

    -----Original Message-----
    From: Taeho Kang
    Sent: Friday, October 05, 2007 12:51 AM
    To: [email protected]
    Subject: Re: Question on relocation of Hadoop cluster

    Thanks again for your answer, Raghu.

    Now, just out of my curiosity....

    How are the exact locations of blocks/chunks being kept track of?
    Is there any other mechanism/information that the namenode uses, other than
    ip address or hostname, to manage the whereabouts of data blocks?

    On 10/5/07, Raghu Angadi wrote:


    (I am not sure if I replied already...)

    Taeho Kang wrote:
    Thanks for your quick reply, Raghu.

    The problem I am faced with is...
    - I need to move my machines to a new location
    Assuming this goes well (i.e. no data loss),
    - The new location will assign new ip addresses for my machines.
    I am worried that this change of ip addresses may create havoc in
    the file
    system ( i.e. discrepancy in file block info) once the cluster
    starts up in
    the new location.
    this not an issue.
    Is it going to be a problem?
    I don't think so.

    Raghu.


    --
    Taeho Kang [tkang.blogspot.com]
    Software Engineer, NHN Corporation, Korea

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 4, '07 at 1:28a
activeOct 5, '07 at 4:33p
posts7
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase