FAQ
The group in charge of the cluster is thinking about server name changes to
reflect changes in policy. What effects if any will this have in regards to
the cluster itself and Cloudera Manager 4.5? What configuration changes
will have to be done in Cloudera Manager 4.5?

Thanks for any help,
Ben

Search Discussions

  • Adam Smieszny at Mar 11, 2013 at 10:48 pm
    Hi Ben,

    They will be changing the hostnames, but not the IP addresses? If that is
    the case, I have a policy that I tested for just such a case.

    Thanks,
    Adam

    On Mon, Mar 11, 2013 at 1:12 PM, Benjamin Kim wrote:

    The group in charge of the cluster is thinking about server name changes
    to reflect changes in policy. What effects if any will this have in regards
    to the cluster itself and Cloudera Manager 4.5? What configuration changes
    will have to be done in Cloudera Manager 4.5?

    Thanks for any help,
    Ben


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156
  • Benjamin Kim at Mar 11, 2013 at 11:38 pm
    Adam,

    Yes, that's the case. It's just changing a prefix on the names of the boxes
    themselves and in the network.

    Thanks,
    Ben
    On Monday, March 11, 2013 3:48:45 PM UTC-7, Adam Smieszny wrote:

    Hi Ben,

    They will be changing the hostnames, but not the IP addresses? If that is
    the case, I have a policy that I tested for just such a case.

    Thanks,
    Adam


    On Mon, Mar 11, 2013 at 1:12 PM, Benjamin Kim <bbui...@gmail.com<javascript:>
    wrote:
    The group in charge of the cluster is thinking about server name changes
    to reflect changes in policy. What effects if any will this have in regards
    to the cluster itself and Cloudera Manager 4.5? What configuration changes
    will have to be done in Cloudera Manager 4.5?

    Thanks for any help,
    Ben


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156
  • Adam Smieszny at Mar 12, 2013 at 8:51 pm
    Oh, I apologize, the process I exercised in the past was to change the IP
    address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change the hostnames
    without losing the association of hosts->services in CM, try the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agent on each of the machines with a
    hostname that is changing, to have CMF_AGENT_ARGS="--host_id xxx" where xxx
    is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old hostnames
    so you don't have to change anything else.

    Thanks,
    Adam
  • Benjamin Kim at Mar 12, 2013 at 9:07 pm
    Adam,

    That sounds like a good start, but what if I want the new hostnames to be
    reflected everywhere and in CM too. The new hostnames will have a better
    prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny wrote:

    Oh, I apologize, the process I exercised in the past was to change the IP
    address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change the hostnames
    without losing the association of hosts->services in CM, try the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agent on each of the machines with a
    hostname that is changing, to have CMF_AGENT_ARGS="--host_id xxx" where xxx
    is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old hostnames
    so you don't have to change anything else.

    Thanks,
    Adam
  • Adam Smieszny at Mar 12, 2013 at 9:55 pm
    In that case, I believe the best option is to actually decommission each
    node, remove it from the cluster via the CM UI, and then re-add it with the
    new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim wrote:

    Adam,

    That sounds like a good start, but what if I want the new hostnames to be
    reflected everywhere and in CM too. The new hostnames will have a better
    prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny wrote:

    Oh, I apologize, the process I exercised in the past was to change the IP
    address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change the
    hostnames without losing the association of hosts->services in CM, try the
    following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen**t on each of the machines with a
    hostname that is changing, to have CMF_AGENT_ARGS="--host_id xxx" where xxx
    is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old hostnames
    so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156
  • Adam Smieszny at Mar 12, 2013 at 9:55 pm
    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx is the new,
    more descriptive hostname :)

    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny wrote:

    In that case, I believe the best option is to actually decommission each
    node, remove it from the cluster via the CM UI, and then re-add it with the
    new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim wrote:

    Adam,

    That sounds like a good start, but what if I want the new hostnames to be
    reflected everywhere and in CM too. The new hostnames will have a better
    prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny wrote:

    Oh, I apologize, the process I exercised in the past was to change the
    IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change the
    hostnames without losing the association of hosts->services in CM, try the
    following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen**t on each of the machines with
    a hostname that is changing, to have CMF_AGENT_ARGS="--host_id xxx" where
    xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old
    hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156
  • Benjamin Kim at Mar 12, 2013 at 11:11 pm
    Adam,

    Thanks for the info. Will this work for the NameNodes, JobTracker, Cloudera
    Manager server, HBaseMaster, etc.? These will change too.

    Cheers,
    Ben
    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny wrote:

    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx is the new,
    more descriptive hostname :)


    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny <ad...@cloudera.com<javascript:>
    wrote:
    In that case, I believe the best option is to actually decommission each
    node, remove it from the cluster via the CM UI, and then re-add it with the
    new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim <bbui...@gmail.com<javascript:>
    wrote:
    Adam,

    That sounds like a good start, but what if I want the new hostnames to
    be reflected everywhere and in CM too. The new hostnames will have a better
    prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny wrote:

    Oh, I apologize, the process I exercised in the past was to change the
    IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change the
    hostnames without losing the association of hosts->services in CM, try the
    following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen**t on each of the machines with
    a hostname that is changing, to have CMF_AGENT_ARGS="--host_id xxx" where
    xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old
    hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156
  • Adam Smieszny at Mar 12, 2013 at 11:18 pm
    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim wrote:

    Adam,

    Thanks for the info. Will this work for the NameNodes, JobTracker,
    Cloudera Manager server, HBaseMaster, etc.? These will change too.

    Cheers,
    Ben

    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny wrote:

    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx is the
    new, more descriptive hostname :)

    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny wrote:

    In that case, I believe the best option is to actually decommission each
    node, remove it from the cluster via the CM UI, and then re-add it with the
    new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim wrote:

    Adam,

    That sounds like a good start, but what if I want the new hostnames to
    be reflected everywhere and in CM too. The new hostnames will have a better
    prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny wrote:

    Oh, I apologize, the process I exercised in the past was to change the
    IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change the
    hostnames without losing the association of hosts->services in CM, try the
    following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen****t on each of the machines
    with a hostname that is changing, to have CMF_AGENT_ARGS="--host_id xxx"
    where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old
    hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156
  • Benjamin Kim at Mar 12, 2013 at 11:26 pm
    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore, Hive-Server2, and
    HBaseThrift
    - 1 master server as the Active NameNode, JournalNode, and Zookeeper
    - 1 master server as the Passive NameNode, JournalNode, and Zookeeper
    - 1 master server as the JobTracker, HBaseMaster, and Zookeeper
    - 6 slave servers as the DataNodes, TaskTrackers, and HRegionServers

    I hope this helps.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny wrote:

    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim <bbui...@gmail.com<javascript:>
    wrote:
    Adam,

    Thanks for the info. Will this work for the NameNodes, JobTracker,
    Cloudera Manager server, HBaseMaster, etc.? These will change too.

    Cheers,
    Ben

    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny wrote:

    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx is the
    new, more descriptive hostname :)

    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny wrote:

    In that case, I believe the best option is to actually decommission
    each node, remove it from the cluster via the CM UI, and then re-add it
    with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim wrote:

    Adam,

    That sounds like a good start, but what if I want the new hostnames to
    be reflected everywhere and in CM too. The new hostnames will have a better
    prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny wrote:

    Oh, I apologize, the process I exercised in the past was to change
    the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change the
    hostnames without losing the association of hosts->services in CM, try the
    following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen****t on each of the machines
    with a hostname that is changing, to have CMF_AGENT_ARGS="--host_id xxx"
    where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old
    hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156
  • Benjamin Kim at Mar 12, 2013 at 11:54 pm
    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore, Hive-Server2, and
    HBaseThrift plus the Hive and HBase clients using the embedded PostgreSQL
    8.4
    - 1 master server as the Active NameNode, JournalNode, and part of the
    Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode, and part of the
    Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode, and part of
    the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and HRegionServers plus
    the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny wrote:

    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim <bbui...@gmail.com<javascript:>
    wrote:
    Adam,

    Thanks for the info. Will this work for the NameNodes, JobTracker,
    Cloudera Manager server, HBaseMaster, etc.? These will change too.

    Cheers,
    Ben

    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny wrote:

    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx is the
    new, more descriptive hostname :)

    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny wrote:

    In that case, I believe the best option is to actually decommission
    each node, remove it from the cluster via the CM UI, and then re-add it
    with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim wrote:

    Adam,

    That sounds like a good start, but what if I want the new hostnames to
    be reflected everywhere and in CM too. The new hostnames will have a better
    prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny wrote:

    Oh, I apologize, the process I exercised in the past was to change
    the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change the
    hostnames without losing the association of hosts->services in CM, try the
    following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen****t on each of the machines
    with a hostname that is changing, to have CMF_AGENT_ARGS="--host_id xxx"
    where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old
    hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156
  • Adam Smieszny at Mar 13, 2013 at 2:28 am
    What about in terms of acceptable downtime? Basically, what is your
    appetite to re-create the cluster?

    If you want to keep the cluster in its current state and minimize downtime,
    I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster setup again
    (stepping through the add/remove node wizards, re-assigning role-to-host
    mapping), then you also have the option to decommission the hosts and then
    add them again.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim wrote:

    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore, Hive-Server2, and
    HBaseThrift plus the Hive and HBase clients using the embedded PostgreSQL
    8.4
    - 1 master server as the Active NameNode, JournalNode, and part of the
    Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode, and part of the
    Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode, and part of
    the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and HRegionServers plus
    the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny wrote:

    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim wrote:

    Adam,

    Thanks for the info. Will this work for the NameNodes, JobTracker,
    Cloudera Manager server, HBaseMaster, etc.? These will change too.

    Cheers,
    Ben

    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny wrote:

    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx is the
    new, more descriptive hostname :)

    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny wrote:

    In that case, I believe the best option is to actually decommission
    each node, remove it from the cluster via the CM UI, and then re-add it
    with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim wrote:

    Adam,

    That sounds like a good start, but what if I want the new hostnames
    to be reflected everywhere and in CM too. The new hostnames will have a
    better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny wrote:

    Oh, I apologize, the process I exercised in the past was to change
    the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change the
    hostnames without losing the association of hosts->services in CM, try the
    following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen******t on each of the
    machines with a hostname that is changing, to have
    CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old
    hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156
  • Benjamin Kim at Mar 13, 2013 at 3:07 pm
    Since the cluster is relatively new, the option to recreate is there.

    But, if I were to go down the route of decomm/re-add, would this work for
    the master nodes such as the NameNodes, JobTracker, HBaseMaster, Zookeeper,
    Hue, CM, etc.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny wrote:

    What about in terms of acceptable downtime? Basically, what is your
    appetite to re-create the cluster?

    If you want to keep the cluster in its current state and minimize
    downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster setup again
    (stepping through the add/remove node wizards, re-assigning role-to-host
    mapping), then you also have the option to decommission the hosts and then
    add them again.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim <bbui...@gmail.com<javascript:>
    wrote:
    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore, Hive-Server2,
    and HBaseThrift plus the Hive and HBase clients using the embedded
    PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and part of the
    Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode, and part of the
    Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode, and part
    of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and HRegionServers plus
    the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny wrote:

    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim wrote:

    Adam,

    Thanks for the info. Will this work for the NameNodes, JobTracker,
    Cloudera Manager server, HBaseMaster, etc.? These will change too.

    Cheers,
    Ben

    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny wrote:

    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx is the
    new, more descriptive hostname :)

    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny wrote:

    In that case, I believe the best option is to actually decommission
    each node, remove it from the cluster via the CM UI, and then re-add it
    with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim wrote:

    Adam,

    That sounds like a good start, but what if I want the new hostnames
    to be reflected everywhere and in CM too. The new hostnames will have a
    better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny wrote:

    Oh, I apologize, the process I exercised in the past was to change
    the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change the
    hostnames without losing the association of hosts->services in CM, try the
    following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen******t on each of the
    machines with a hostname that is changing, to have
    CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old
    hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156
  • Philip Langdale at Mar 13, 2013 at 4:37 pm
    Hi Ben,

    See my response here. This is how you can change your hostnames without
    having to reassign roles or anything like that.

    https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:

    Since the cluster is relatively new, the option to recreate is there.

    But, if I were to go down the route of decomm/re-add, would this work for
    the master nodes such as the NameNodes, JobTracker, HBaseMaster, Zookeeper,
    Hue, CM, etc.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny wrote:

    What about in terms of acceptable downtime? Basically, what is your
    appetite to re-create the cluster?

    If you want to keep the cluster in its current state and minimize
    downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster setup again
    (stepping through the add/remove node wizards, re-assigning role-to-host
    mapping), then you also have the option to decommission the hosts and then
    add them again.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim wrote:

    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore, Hive-Server2,
    and HBaseThrift plus the Hive and HBase clients using the embedded
    PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and part of the
    Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode, and part of the
    Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode, and part
    of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and HRegionServers
    plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny wrote:

    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim wrote:

    Adam,

    Thanks for the info. Will this work for the NameNodes, JobTracker,
    Cloudera Manager server, HBaseMaster, etc.? These will change too.

    Cheers,
    Ben

    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny wrote:

    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx is the
    new, more descriptive hostname :)

    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny wrote:

    In that case, I believe the best option is to actually
    decommission each node, remove it from the cluster via the CM UI, and then
    re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim wrote:

    Adam,

    That sounds like a good start, but what if I want the new hostnames
    to be reflected everywhere and in CM too. The new hostnames will have a
    better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny wrote:

    Oh, I apologize, the process I exercised in the past was to change
    the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change the
    hostnames without losing the association of hosts->services in CM, try the
    following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen********t on each of the
    machines with a hostname that is changing, to have
    CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old
    hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin****
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin****
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156
  • Benjamin Kim at Mar 13, 2013 at 5:32 pm
    Phil,

    Unfortunately, the hostname that will change is the first part. Right now,
    it's like sml-nn1.example.com. It will change to lrg-nn1.example.com for
    NameNode1. Plus, infrastructure will be doing the change and not us. They
    will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my opinion. What do
    you think?

    If we do have to recreate, what would be the best way of backing up its
    current state and restoring it back to its original state? Or do we have to
    at all?

    Thanks,
    Ben
    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale wrote:

    Hi Ben,

    See my response here. This is how you can change your hostnames without
    having to reassign roles or anything like that.


    https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ

    --phil


    On 13 March 2013 08:07, Benjamin Kim <bbui...@gmail.com <javascript:>>wrote:
    Since the cluster is relatively new, the option to recreate is there.

    But, if I were to go down the route of decomm/re-add, would this work for
    the master nodes such as the NameNodes, JobTracker, HBaseMaster, Zookeeper,
    Hue, CM, etc.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny wrote:

    What about in terms of acceptable downtime? Basically, what is your
    appetite to re-create the cluster?

    If you want to keep the cluster in its current state and minimize
    downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster setup again
    (stepping through the add/remove node wizards, re-assigning role-to-host
    mapping), then you also have the option to decommission the hosts and then
    add them again.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim wrote:

    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore, Hive-Server2,
    and HBaseThrift plus the Hive and HBase clients using the embedded
    PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and part of the
    Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode, and part of the
    Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode, and part
    of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and HRegionServers
    plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny wrote:

    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim wrote:

    Adam,

    Thanks for the info. Will this work for the NameNodes, JobTracker,
    Cloudera Manager server, HBaseMaster, etc.? These will change too.

    Cheers,
    Ben

    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny wrote:

    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx is
    the new, more descriptive hostname :)

    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny wrote:

    In that case, I believe the best option is to actually
    decommission each node, remove it from the cluster via the CM UI, and then
    re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim wrote:

    Adam,

    That sounds like a good start, but what if I want the new
    hostnames to be reflected everywhere and in CM too. The new hostnames will
    have a better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny wrote:

    Oh, I apologize, the process I exercised in the past was to
    change the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change the
    hostnames without losing the association of hosts->services in CM, try the
    following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen********t on each of the
    machines with a hostname that is changing, to have
    CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old
    hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin****
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin****
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156
  • Philip Langdale at Mar 13, 2013 at 8:22 pm
    Hi Ben,

    No, it'll work - not all of that message is applicable in your situation,
    so let me try and clarify.

    Let's take your example host.

    Old hostname: sml-nn1.example.com
    New hostname: lrg-nn1.example.com

    If you, today, add a --host_id=sml-nn1.example.com to your agent command
    args (as described in the link)
    then everything will continue to work after the DNS names change. (Although
    note that you should really
    restart all your hosts and services due to how these things get cached in
    various places)


    --phil

    On 13 March 2013 10:32, Benjamin Kim wrote:

    Phil,

    Unfortunately, the hostname that will change is the first part. Right now,
    it's like sml-nn1.example.com. It will change to lrg-nn1.example.com for
    NameNode1. Plus, infrastructure will be doing the change and not us. They
    will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my opinion. What do
    you think?

    If we do have to recreate, what would be the best way of backing up its
    current state and restoring it back to its original state? Or do we have to
    at all?

    Thanks,
    Ben

    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale wrote:

    Hi Ben,

    See my response here. This is how you can change your hostnames without
    having to reassign roles or anything like that.

    https://groups.google.com/a/**cloudera.org/d/msg/scm-users/**
    m2U9m4BfH0w/lGoq_UvOs-oJ<https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ>

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:

    Since the cluster is relatively new, the option to recreate is there.

    But, if I were to go down the route of decomm/re-add, would this work
    for the master nodes such as the NameNodes, JobTracker, HBaseMaster,
    Zookeeper, Hue, CM, etc.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny wrote:

    What about in terms of acceptable downtime? Basically, what is your
    appetite to re-create the cluster?

    If you want to keep the cluster in its current state and minimize
    downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster setup again
    (stepping through the add/remove node wizards, re-assigning role-to-host
    mapping), then you also have the option to decommission the hosts and then
    add them again.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim wrote:

    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore, Hive-Server2,
    and HBaseThrift plus the Hive and HBase clients using the embedded
    PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and part of the
    Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode, and part of
    the Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode,
    and part of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and HRegionServers
    plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny wrote:

    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim wrote:

    Adam,

    Thanks for the info. Will this work for the NameNodes, JobTracker,
    Cloudera Manager server, HBaseMaster, etc.? These will change too.

    Cheers,
    Ben

    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny wrote:

    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx is
    the new, more descriptive hostname :)

    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny wrote:

    In that case, I believe the best option is to actually
    decommission each node, remove it from the cluster via the CM UI, and then
    re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim wrote:

    Adam,

    That sounds like a good start, but what if I want the new
    hostnames to be reflected everywhere and in CM too. The new hostnames will
    have a better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny wrote:

    Oh, I apologize, the process I exercised in the past was to
    change the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change the
    hostnames without losing the association of hosts->services in CM, try the
    following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen**********t on each of
    the machines with a hostname that is changing, to have
    CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old
    hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin******
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin******
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin****
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156
  • Benjamin Kim at Mar 13, 2013 at 9:14 pm
    Phil,

    It looks like that's the way to go.

    On each server in the cluster, we would:
    1. Stop all cluster services in Cloudera Manager
    2. Edit /etc/default/cloudera-scm-agent and add the old identity hostname
    "--host-id=sml-<rolename>.example.com" to the variable CMF_AGENT_ARGS.
    3. Change the hostnames of all the cluster nodes in DNS
    4. Restart all the cluster nodes
    5. Start all cluster services in Cloudera Manager

    Does this sound right?

    Thanks,
    Ben
    On Wednesday, March 13, 2013 1:21:57 PM UTC-7, Philip Langdale wrote:

    Hi Ben,

    No, it'll work - not all of that message is applicable in your situation,
    so let me try and clarify.

    Let's take your example host.

    Old hostname: sml-nn1.example.com
    New hostname: lrg-nn1.example.com

    If you, today, add a --host_id=sml-nn1.example.com to your agent command
    args (as described in the link)
    then everything will continue to work after the DNS names change.
    (Although note that you should really
    restart all your hosts and services due to how these things get cached in
    various places)


    --phil


    On 13 March 2013 10:32, Benjamin Kim <bbui...@gmail.com <javascript:>>wrote:
    Phil,

    Unfortunately, the hostname that will change is the first part. Right
    now, it's like sml-nn1.example.com. It will change to lrg-nn1.example.comfor NameNode1. Plus, infrastructure will be doing the change and not us.
    They will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my opinion. What do
    you think?

    If we do have to recreate, what would be the best way of backing up its
    current state and restoring it back to its original state? Or do we have to
    at all?

    Thanks,
    Ben

    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale wrote:

    Hi Ben,

    See my response here. This is how you can change your hostnames without
    having to reassign roles or anything like that.

    https://groups.google.com/a/**cloudera.org/d/msg/scm-users/**
    m2U9m4BfH0w/lGoq_UvOs-oJ<https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ>

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:

    Since the cluster is relatively new, the option to recreate is there.

    But, if I were to go down the route of decomm/re-add, would this work
    for the master nodes such as the NameNodes, JobTracker, HBaseMaster,
    Zookeeper, Hue, CM, etc.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny wrote:

    What about in terms of acceptable downtime? Basically, what is your
    appetite to re-create the cluster?

    If you want to keep the cluster in its current state and minimize
    downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster setup
    again (stepping through the add/remove node wizards, re-assigning
    role-to-host mapping), then you also have the option to decommission the
    hosts and then add them again.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim wrote:

    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore,
    Hive-Server2, and HBaseThrift plus the Hive and HBase clients using the
    embedded PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and part of
    the Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode, and part of
    the Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode,
    and part of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and HRegionServers
    plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny wrote:

    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim wrote:

    Adam,

    Thanks for the info. Will this work for the NameNodes, JobTracker,
    Cloudera Manager server, HBaseMaster, etc.? These will change too.

    Cheers,
    Ben

    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny wrote:

    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx is
    the new, more descriptive hostname :)


    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny <ad...@cloudera.com
    wrote:
    In that case, I believe the best option is to actually
    decommission each node, remove it from the cluster via the CM UI, and then
    re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim wrote:

    Adam,

    That sounds like a good start, but what if I want the new
    hostnames to be reflected everywhere and in CM too. The new hostnames will
    have a better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny wrote:

    Oh, I apologize, the process I exercised in the past was to
    change the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change
    the hostnames without losing the association of hosts->services in CM, try
    the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen**********t on each of
    the machines with a hostname that is changing, to have
    CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old
    hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin******
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin******
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin****
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156
  • Philip Langdale at Mar 13, 2013 at 9:28 pm
    Yep.

    --phil

    On 13 March 2013 14:14, Benjamin Kim wrote:

    Phil,

    It looks like that's the way to go.

    On each server in the cluster, we would:
    1. Stop all cluster services in Cloudera Manager
    2. Edit /etc/default/cloudera-scm-agent and add the old identity hostname
    "--host-id=sml-<rolename>.example.com" to the variable CMF_AGENT_ARGS.
    3. Change the hostnames of all the cluster nodes in DNS
    4. Restart all the cluster nodes
    5. Start all cluster services in Cloudera Manager

    Does this sound right?

    Thanks,
    Ben
    On Wednesday, March 13, 2013 1:21:57 PM UTC-7, Philip Langdale wrote:

    Hi Ben,

    No, it'll work - not all of that message is applicable in your situation,
    so let me try and clarify.

    Let's take your example host.

    Old hostname: sml-nn1.example.com
    New hostname: lrg-nn1.example.com

    If you, today, add a --host_id=sml-nn1.example.com to your agent command
    args (as described in the link)
    then everything will continue to work after the DNS names change.
    (Although note that you should really
    restart all your hosts and services due to how these things get cached in
    various places)


    --phil

    On 13 March 2013 10:32, Benjamin Kim wrote:

    Phil,

    Unfortunately, the hostname that will change is the first part. Right
    now, it's like sml-nn1.example.com. It will change to
    lrg-nn1.example.com for NameNode1. Plus, infrastructure will be doing
    the change and not us. They will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my opinion. What
    do you think?

    If we do have to recreate, what would be the best way of backing up its
    current state and restoring it back to its original state? Or do we have to
    at all?

    Thanks,
    Ben

    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale wrote:

    Hi Ben,

    See my response here. This is how you can change your hostnames without
    having to reassign roles or anything like that.

    https://groups.google.com/a/**cl**oudera.org/d/msg/scm-users/**m2U**
    9m4BfH0w/lGoq_UvOs-oJ<https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ>

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:

    Since the cluster is relatively new, the option to recreate is there.

    But, if I were to go down the route of decomm/re-add, would this work
    for the master nodes such as the NameNodes, JobTracker, HBaseMaster,
    Zookeeper, Hue, CM, etc.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny wrote:

    What about in terms of acceptable downtime? Basically, what is your
    appetite to re-create the cluster?

    If you want to keep the cluster in its current state and minimize
    downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster setup
    again (stepping through the add/remove node wizards, re-assigning
    role-to-host mapping), then you also have the option to decommission the
    hosts and then add them again.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim wrote:

    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore,
    Hive-Server2, and HBaseThrift plus the Hive and HBase clients using the
    embedded PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and part of
    the Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode, and part of
    the Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode,
    and part of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and HRegionServers
    plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny wrote:

    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim wrote:

    Adam,

    Thanks for the info. Will this work for the NameNodes, JobTracker,
    Cloudera Manager server, HBaseMaster, etc.? These will change too.

    Cheers,
    Ben

    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny wrote:

    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx is
    the new, more descriptive hostname :)


    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny <
    ad...@cloudera.com> wrote:
    In that case, I believe the best option is to actually
    decommission each node, remove it from the cluster via the CM UI, and then
    re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim <bbui...@gmail.com
    wrote:
    Adam,

    That sounds like a good start, but what if I want the new
    hostnames to be reflected everywhere and in CM too. The new hostnames will
    have a better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben


    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny
    wrote:
    Oh, I apologize, the process I exercised in the past was to
    change the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change
    the hostnames without losing the association of hosts->services in CM, try
    the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen************t on each
    of the machines with a hostname that is changing, to have
    CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the
    old hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin******
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin****
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156
  • Benjamin Kim at Mar 18, 2013 at 8:58 pm
    Phil,

    It looks like I got the green light to rebuild the cluster since there is
    nothing critical on there. Can you see if this sounds like a good plan?

    In Cloudera Manager:
    1. Stop all cluster services
    2. Delete all services from the cluster
    3. Delete all hosts

    In Embedded PostgreSQL:
    1. Drop the Hive metastore but keep Hue and Oozie DB's

    Back in Cloudera Manager:
    1. Add all the new hosts
    2. Assign the proper roles
    3. Recreate the Hive metastore DB
    4. Apply service configurations

    Thanks,
    Ben

    On Wednesday, March 13, 2013 2:28:44 PM UTC-7, Philip Langdale wrote:

    Yep.

    --phil


    On 13 March 2013 14:14, Benjamin Kim <bbui...@gmail.com <javascript:>>wrote:
    Phil,

    It looks like that's the way to go.

    On each server in the cluster, we would:
    1. Stop all cluster services in Cloudera Manager
    2. Edit /etc/default/cloudera-scm-agent and add the old identity hostname
    "--host-id=sml-<rolename>.example.com" to the variable CMF_AGENT_ARGS.
    3. Change the hostnames of all the cluster nodes in DNS
    4. Restart all the cluster nodes
    5. Start all cluster services in Cloudera Manager

    Does this sound right?

    Thanks,
    Ben
    On Wednesday, March 13, 2013 1:21:57 PM UTC-7, Philip Langdale wrote:

    Hi Ben,

    No, it'll work - not all of that message is applicable in your
    situation, so let me try and clarify.

    Let's take your example host.

    Old hostname: sml-nn1.example.com
    New hostname: lrg-nn1.example.com

    If you, today, add a --host_id=sml-nn1.example.com to your agent
    command args (as described in the link)
    then everything will continue to work after the DNS names change.
    (Although note that you should really
    restart all your hosts and services due to how these things get cached
    in various places)


    --phil

    On 13 March 2013 10:32, Benjamin Kim wrote:

    Phil,

    Unfortunately, the hostname that will change is the first part. Right
    now, it's like sml-nn1.example.com. It will change to
    lrg-nn1.example.com for NameNode1. Plus, infrastructure will be doing
    the change and not us. They will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my opinion. What
    do you think?

    If we do have to recreate, what would be the best way of backing up its
    current state and restoring it back to its original state? Or do we have to
    at all?

    Thanks,
    Ben

    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale wrote:

    Hi Ben,

    See my response here. This is how you can change your hostnames
    without having to reassign roles or anything like that.

    https://groups.google.com/a/**cl**oudera.org/d/msg/scm-users/**m2U**
    9m4BfH0w/lGoq_UvOs-oJ<https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ>

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:

    Since the cluster is relatively new, the option to recreate is there.

    But, if I were to go down the route of decomm/re-add, would this work
    for the master nodes such as the NameNodes, JobTracker, HBaseMaster,
    Zookeeper, Hue, CM, etc.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny wrote:

    What about in terms of acceptable downtime? Basically, what is your
    appetite to re-create the cluster?

    If you want to keep the cluster in its current state and minimize
    downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster setup
    again (stepping through the add/remove node wizards, re-assigning
    role-to-host mapping), then you also have the option to decommission the
    hosts and then add them again.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim wrote:

    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore,
    Hive-Server2, and HBaseThrift plus the Hive and HBase clients using the
    embedded PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and part of
    the Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode, and part of
    the Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode,
    and part of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and
    HRegionServers plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny wrote:

    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim wrote:

    Adam,

    Thanks for the info. Will this work for the NameNodes,
    JobTracker, Cloudera Manager server, HBaseMaster, etc.? These will change
    too.

    Cheers,
    Ben

    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny wrote:

    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx
    is the new, more descriptive hostname :)


    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny <
    ad...@cloudera.com> wrote:
    In that case, I believe the best option is to actually
    decommission each node, remove it from the cluster via the CM UI, and then
    re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    That sounds like a good start, but what if I want the new
    hostnames to be reflected everywhere and in CM too. The new hostnames will
    have a better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben


    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny
    wrote:
    Oh, I apologize, the process I exercised in the past was to
    change the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change
    the hostnames without losing the association of hosts->services in CM, try
    the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen************t on each
    of the machines with a hostname that is changing, to have
    CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the
    old hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin******
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin****
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156
  • Philip Langdale at Mar 18, 2013 at 9:06 pm
    Hi Ben,

    That's all fine, but there's no particular reason to drop the metastore DB
    - it shouldn't contain any information that's
    going to be invalidated by the host shuffle. I would just leave it be.

    --phil

    On 18 March 2013 13:58, Benjamin Kim wrote:

    Phil,

    It looks like I got the green light to rebuild the cluster since there is
    nothing critical on there. Can you see if this sounds like a good plan?

    In Cloudera Manager:
    1. Stop all cluster services
    2. Delete all services from the cluster
    3. Delete all hosts

    In Embedded PostgreSQL:
    1. Drop the Hive metastore but keep Hue and Oozie DB's

    Back in Cloudera Manager:
    1. Add all the new hosts
    2. Assign the proper roles
    3. Recreate the Hive metastore DB
    4. Apply service configurations

    Thanks,
    Ben

    On Wednesday, March 13, 2013 2:28:44 PM UTC-7, Philip Langdale wrote:

    Yep.

    --phil

    On 13 March 2013 14:14, Benjamin Kim wrote:

    Phil,

    It looks like that's the way to go.

    On each server in the cluster, we would:
    1. Stop all cluster services in Cloudera Manager
    2. Edit /etc/default/cloudera-scm-**agent and add the old identity
    hostname "--host-id=sml-<rolename>.exam**ple.com <http://example.com>"
    to the variable CMF_AGENT_ARGS.
    3. Change the hostnames of all the cluster nodes in DNS
    4. Restart all the cluster nodes
    5. Start all cluster services in Cloudera Manager

    Does this sound right?

    Thanks,
    Ben
    On Wednesday, March 13, 2013 1:21:57 PM UTC-7, Philip Langdale wrote:

    Hi Ben,

    No, it'll work - not all of that message is applicable in your
    situation, so let me try and clarify.

    Let's take your example host.

    Old hostname: sml-nn1.example.com
    New hostname: lrg-nn1.example.com

    If you, today, add a --host_id=sml-nn1.example.com to your agent
    command args (as described in the link)
    then everything will continue to work after the DNS names change.
    (Although note that you should really
    restart all your hosts and services due to how these things get cached
    in various places)


    --phil

    On 13 March 2013 10:32, Benjamin Kim wrote:

    Phil,

    Unfortunately, the hostname that will change is the first part. Right
    now, it's like sml-nn1.example.com. It will change to
    lrg-nn1.example.com for NameNode1. Plus, infrastructure will be doing
    the change and not us. They will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my opinion. What
    do you think?

    If we do have to recreate, what would be the best way of backing up
    its current state and restoring it back to its original state? Or do we
    have to at all?

    Thanks,
    Ben

    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale wrote:

    Hi Ben,

    See my response here. This is how you can change your hostnames
    without having to reassign roles or anything like that.

    https://groups.google.com/a/**cl****oudera.org/d/msg/scm-users/**m2U*
    ***9m4BfH0w/lGoq_UvOs-oJ<https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ>

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:

    Since the cluster is relatively new, the option to recreate is there.

    But, if I were to go down the route of decomm/re-add, would this
    work for the master nodes such as the NameNodes, JobTracker, HBaseMaster,
    Zookeeper, Hue, CM, etc.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny wrote:

    What about in terms of acceptable downtime? Basically, what is your
    appetite to re-create the cluster?

    If you want to keep the cluster in its current state and minimize
    downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster setup
    again (stepping through the add/remove node wizards, re-assigning
    role-to-host mapping), then you also have the option to decommission the
    hosts and then add them again.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim wrote:

    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore,
    Hive-Server2, and HBaseThrift plus the Hive and HBase clients using the
    embedded PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and part of
    the Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode, and part
    of the Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode,
    and part of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and
    HRegionServers plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny wrote:

    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim wrote:

    Adam,

    Thanks for the info. Will this work for the NameNodes,
    JobTracker, Cloudera Manager server, HBaseMaster, etc.? These will change
    too.

    Cheers,
    Ben

    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny wrote:

    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx
    is the new, more descriptive hostname :)


    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny <
    ad...@cloudera.com> wrote:
    In that case, I believe the best option is to actually
    decommission each node, remove it from the cluster via the CM UI, and then
    re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    That sounds like a good start, but what if I want the new
    hostnames to be reflected everywhere and in CM too. The new hostnames will
    have a better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben


    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny
    wrote:
    Oh, I apologize, the process I exercised in the past was to
    change the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change
    the hostnames without losing the association of hosts->services in CM, try
    the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen**************t on
    each of the machines with a hostname that is changing, to have
    CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the
    old hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin******
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156
  • Darren Lo at Mar 18, 2013 at 9:07 pm
    I think the hive metastore DB can store host names of namenodes (unless
    you're using HA), so you're probably better off dropping it.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 2:06 PM, Philip Langdale wrote:

    Hi Ben,

    That's all fine, but there's no particular reason to drop the metastore DB
    - it shouldn't contain any information that's
    going to be invalidated by the host shuffle. I would just leave it be.

    --phil

    On 18 March 2013 13:58, Benjamin Kim wrote:

    Phil,

    It looks like I got the green light to rebuild the cluster since there is
    nothing critical on there. Can you see if this sounds like a good plan?

    In Cloudera Manager:
    1. Stop all cluster services
    2. Delete all services from the cluster
    3. Delete all hosts

    In Embedded PostgreSQL:
    1. Drop the Hive metastore but keep Hue and Oozie DB's

    Back in Cloudera Manager:
    1. Add all the new hosts
    2. Assign the proper roles
    3. Recreate the Hive metastore DB
    4. Apply service configurations

    Thanks,
    Ben

    On Wednesday, March 13, 2013 2:28:44 PM UTC-7, Philip Langdale wrote:

    Yep.

    --phil

    On 13 March 2013 14:14, Benjamin Kim wrote:

    Phil,

    It looks like that's the way to go.

    On each server in the cluster, we would:
    1. Stop all cluster services in Cloudera Manager
    2. Edit /etc/default/cloudera-scm-**agent and add the old identity
    hostname "--host-id=sml-<rolename>.exam**ple.com <http://example.com>"
    to the variable CMF_AGENT_ARGS.
    3. Change the hostnames of all the cluster nodes in DNS
    4. Restart all the cluster nodes
    5. Start all cluster services in Cloudera Manager

    Does this sound right?

    Thanks,
    Ben
    On Wednesday, March 13, 2013 1:21:57 PM UTC-7, Philip Langdale wrote:

    Hi Ben,

    No, it'll work - not all of that message is applicable in your
    situation, so let me try and clarify.

    Let's take your example host.

    Old hostname: sml-nn1.example.com
    New hostname: lrg-nn1.example.com

    If you, today, add a --host_id=sml-nn1.example.com to your agent
    command args (as described in the link)
    then everything will continue to work after the DNS names change.
    (Although note that you should really
    restart all your hosts and services due to how these things get cached
    in various places)


    --phil

    On 13 March 2013 10:32, Benjamin Kim wrote:

    Phil,

    Unfortunately, the hostname that will change is the first part. Right
    now, it's like sml-nn1.example.com. It will change to
    lrg-nn1.example.com for NameNode1. Plus, infrastructure will be
    doing the change and not us. They will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my opinion.
    What do you think?

    If we do have to recreate, what would be the best way of backing up
    its current state and restoring it back to its original state? Or do we
    have to at all?

    Thanks,
    Ben

    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale wrote:

    Hi Ben,

    See my response here. This is how you can change your hostnames
    without having to reassign roles or anything like that.

    https://groups.google.com/a/**cl****oudera.org/d/msg/scm-users/**m2U
    ****9m4BfH0w/lGoq_UvOs-oJ<https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ>

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:

    Since the cluster is relatively new, the option to recreate is
    there.

    But, if I were to go down the route of decomm/re-add, would this
    work for the master nodes such as the NameNodes, JobTracker, HBaseMaster,
    Zookeeper, Hue, CM, etc.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny wrote:

    What about in terms of acceptable downtime? Basically, what is
    your appetite to re-create the cluster?

    If you want to keep the cluster in its current state and minimize
    downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster setup
    again (stepping through the add/remove node wizards, re-assigning
    role-to-host mapping), then you also have the option to decommission the
    hosts and then add them again.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim wrote:

    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore,
    Hive-Server2, and HBaseThrift plus the Hive and HBase clients using the
    embedded PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and part
    of the Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode, and part
    of the Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode,
    and part of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and
    HRegionServers plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny wrote:

    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim <bbui...@gmail.com
    wrote:
    Adam,

    Thanks for the info. Will this work for the NameNodes,
    JobTracker, Cloudera Manager server, HBaseMaster, etc.? These will change
    too.

    Cheers,
    Ben


    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny
    wrote:
    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx
    is the new, more descriptive hostname :)


    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny <
    ad...@cloudera.com> wrote:
    In that case, I believe the best option is to actually
    decommission each node, remove it from the cluster via the CM UI, and then
    re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    That sounds like a good start, but what if I want the new
    hostnames to be reflected everywhere and in CM too. The new hostnames will
    have a better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben


    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny
    wrote:
    Oh, I apologize, the process I exercised in the past was to
    change the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to
    change the hostnames without losing the association of hosts->services in
    CM, try the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen**************t on
    each of the machines with a hostname that is changing, to have
    CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the
    old hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**********
    .com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin******
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Thanks,
    Darren
  • Benjamin Kim at Mar 18, 2013 at 9:18 pm
    Darren,

    We are using HA so dropping it would be prudent. Also, should I delete the
    HDFS data or DROP the Hive tables before?

    Thanks,
    Ben
    On Monday, March 18, 2013 2:07:49 PM UTC-7, Darren Lo wrote:

    I think the hive metastore DB can store host names of namenodes (unless
    you're using HA), so you're probably better off dropping it.

    Thanks,
    Darren


    On Mon, Mar 18, 2013 at 2:06 PM, Philip Langdale <phi...@cloudera.com<javascript:>
    wrote:
    Hi Ben,

    That's all fine, but there's no particular reason to drop the metastore
    DB - it shouldn't contain any information that's
    going to be invalidated by the host shuffle. I would just leave it be.

    --phil


    On 18 March 2013 13:58, Benjamin Kim <bbui...@gmail.com <javascript:>>wrote:
    Phil,

    It looks like I got the green light to rebuild the cluster since there
    is nothing critical on there. Can you see if this sounds like a good plan?

    In Cloudera Manager:
    1. Stop all cluster services
    2. Delete all services from the cluster
    3. Delete all hosts

    In Embedded PostgreSQL:
    1. Drop the Hive metastore but keep Hue and Oozie DB's

    Back in Cloudera Manager:
    1. Add all the new hosts
    2. Assign the proper roles
    3. Recreate the Hive metastore DB
    4. Apply service configurations

    Thanks,
    Ben

    On Wednesday, March 13, 2013 2:28:44 PM UTC-7, Philip Langdale wrote:

    Yep.

    --phil

    On 13 March 2013 14:14, Benjamin Kim wrote:

    Phil,

    It looks like that's the way to go.

    On each server in the cluster, we would:
    1. Stop all cluster services in Cloudera Manager
    2. Edit /etc/default/cloudera-scm-**agent and add the old identity
    hostname "--host-id=sml-<rolename>.exam**ple.com <http://example.com>"
    to the variable CMF_AGENT_ARGS.
    3. Change the hostnames of all the cluster nodes in DNS
    4. Restart all the cluster nodes
    5. Start all cluster services in Cloudera Manager

    Does this sound right?

    Thanks,
    Ben
    On Wednesday, March 13, 2013 1:21:57 PM UTC-7, Philip Langdale wrote:

    Hi Ben,

    No, it'll work - not all of that message is applicable in your
    situation, so let me try and clarify.

    Let's take your example host.

    Old hostname: sml-nn1.example.com
    New hostname: lrg-nn1.example.com

    If you, today, add a --host_id=sml-nn1.example.com to your agent
    command args (as described in the link)
    then everything will continue to work after the DNS names change.
    (Although note that you should really
    restart all your hosts and services due to how these things get
    cached in various places)


    --phil

    On 13 March 2013 10:32, Benjamin Kim wrote:

    Phil,

    Unfortunately, the hostname that will change is the first part.
    Right now, it's like sml-nn1.example.com. It will change to
    lrg-nn1.example.com for NameNode1. Plus, infrastructure will be
    doing the change and not us. They will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my opinion.
    What do you think?

    If we do have to recreate, what would be the best way of backing up
    its current state and restoring it back to its original state? Or do we
    have to at all?

    Thanks,
    Ben

    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale wrote:

    Hi Ben,

    See my response here. This is how you can change your hostnames
    without having to reassign roles or anything like that.

    https://groups.google.com/a/**cl****oudera.org/d/msg/scm-users/**
    m2U****9m4BfH0w/lGoq_UvOs-oJ<https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ>

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:

    Since the cluster is relatively new, the option to recreate is
    there.

    But, if I were to go down the route of decomm/re-add, would this
    work for the master nodes such as the NameNodes, JobTracker, HBaseMaster,
    Zookeeper, Hue, CM, etc.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny wrote:

    What about in terms of acceptable downtime? Basically, what is
    your appetite to re-create the cluster?

    If you want to keep the cluster in its current state and minimize
    downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster setup
    again (stepping through the add/remove node wizards, re-assigning
    role-to-host mapping), then you also have the option to decommission the
    hosts and then add them again.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim wrote:

    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore,
    Hive-Server2, and HBaseThrift plus the Hive and HBase clients using the
    embedded PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and part
    of the Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode, and part
    of the Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode,
    and part of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and
    HRegionServers plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny wrote:

    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    Thanks for the info. Will this work for the NameNodes,
    JobTracker, Cloudera Manager server, HBaseMaster, etc.? These will change
    too.

    Cheers,
    Ben


    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny
    wrote:
    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where
    xxx is the new, more descriptive hostname :)


    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny <
    ad...@cloudera.com> wrote:
    In that case, I believe the best option is to actually
    decommission each node, remove it from the cluster via the CM UI, and then
    re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    That sounds like a good start, but what if I want the new
    hostnames to be reflected everywhere and in CM too. The new hostnames will
    have a better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben


    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny
    wrote:
    Oh, I apologize, the process I exercised in the past was
    to change the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to
    change the hostnames without losing the association of hosts->services in
    CM, try the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen**************t on
    each of the machines with a hostname that is changing, to have
    CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting
    the old hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin*********
    *.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**********
    .com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin******
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Thanks,
    Darren
  • Darren Lo at Mar 18, 2013 at 9:56 pm
    If you are using HA, did you run the Hive update namenode locations
    command? It's a Hive service command that helps you run the hive --service
    metatool command, which will change Hive's metastore to reference namenodes
    by nameservice instead of by hostname. The wizard does not run it
    automatically, but it does prompt you to manually take a backup of the hive
    metastore db and then run this update namenode locations command.

    You might be able to keep your HDFS data for Hive and hive schema by using
    this command (or manually playing around with hive --service metatool, but
    the command is easier). Just be sure to pick the same nameservice in your
    new HDFS setup. Worst case scenario you can try running through everything
    and if it doesn't work, then you can wipe your hive warehouse in HDFS and
    your hive schema at the end.

    Either way probably a good idea to back up your hive metastore db before
    you do your re-install.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 2:18 PM, Benjamin Kim wrote:

    Darren,

    We are using HA so dropping it would be prudent. Also, should I delete the
    HDFS data or DROP the Hive tables before?

    Thanks,
    Ben

    On Monday, March 18, 2013 2:07:49 PM UTC-7, Darren Lo wrote:

    I think the hive metastore DB can store host names of namenodes (unless
    you're using HA), so you're probably better off dropping it.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 2:06 PM, Philip Langdale wrote:

    Hi Ben,

    That's all fine, but there's no particular reason to drop the metastore
    DB - it shouldn't contain any information that's
    going to be invalidated by the host shuffle. I would just leave it be.

    --phil

    On 18 March 2013 13:58, Benjamin Kim wrote:

    Phil,

    It looks like I got the green light to rebuild the cluster since there
    is nothing critical on there. Can you see if this sounds like a good plan?

    In Cloudera Manager:
    1. Stop all cluster services
    2. Delete all services from the cluster
    3. Delete all hosts

    In Embedded PostgreSQL:
    1. Drop the Hive metastore but keep Hue and Oozie DB's

    Back in Cloudera Manager:
    1. Add all the new hosts
    2. Assign the proper roles
    3. Recreate the Hive metastore DB
    4. Apply service configurations

    Thanks,
    Ben

    On Wednesday, March 13, 2013 2:28:44 PM UTC-7, Philip Langdale wrote:

    Yep.

    --phil

    On 13 March 2013 14:14, Benjamin Kim wrote:

    Phil,

    It looks like that's the way to go.

    On each server in the cluster, we would:
    1. Stop all cluster services in Cloudera Manager
    2. Edit /etc/default/cloudera-scm-**agen**t and add the old identity
    hostname "--host-id=sml-<rolename>.exam****ple.com<http://example.com>"
    to the variable CMF_AGENT_ARGS.
    3. Change the hostnames of all the cluster nodes in DNS
    4. Restart all the cluster nodes
    5. Start all cluster services in Cloudera Manager

    Does this sound right?

    Thanks,
    Ben
    On Wednesday, March 13, 2013 1:21:57 PM UTC-7, Philip Langdale wrote:

    Hi Ben,

    No, it'll work - not all of that message is applicable in your
    situation, so let me try and clarify.

    Let's take your example host.

    Old hostname: sml-nn1.example.com
    New hostname: lrg-nn1.example.com

    If you, today, add a --host_id=sml-nn1.example.com to your agent
    command args (as described in the link)
    then everything will continue to work after the DNS names change.
    (Although note that you should really
    restart all your hosts and services due to how these things get
    cached in various places)


    --phil

    On 13 March 2013 10:32, Benjamin Kim wrote:

    Phil,

    Unfortunately, the hostname that will change is the first part.
    Right now, it's like sml-nn1.example.com. It will change to
    lrg-nn1.example.com for NameNode1. Plus, infrastructure will be
    doing the change and not us. They will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my opinion.
    What do you think?

    If we do have to recreate, what would be the best way of backing up
    its current state and restoring it back to its original state? Or do we
    have to at all?

    Thanks,
    Ben


    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale
    wrote:
    Hi Ben,

    See my response here. This is how you can change your hostnames
    without having to reassign roles or anything like that.

    https://groups.google.com/a/**cl******oudera.org/d/msg/scm-users/*
    *m2U******9m4BfH0w/lGoq_UvOs-oJ<https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ>

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:

    Since the cluster is relatively new, the option to recreate is
    there.

    But, if I were to go down the route of decomm/re-add, would this
    work for the master nodes such as the NameNodes, JobTracker, HBaseMaster,
    Zookeeper, Hue, CM, etc.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny wrote:

    What about in terms of acceptable downtime? Basically, what is
    your appetite to re-create the cluster?

    If you want to keep the cluster in its current state and
    minimize downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster
    setup again (stepping through the add/remove node wizards, re-assigning
    role-to-host mapping), then you also have the option to decommission the
    hosts and then add them again.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim <bbui...@gmail.com
    wrote:
    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore,
    Hive-Server2, and HBaseThrift plus the Hive and HBase clients using the
    embedded PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and part
    of the Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode,
    and part of the Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode,
    and part of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and
    HRegionServers plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny
    wrote:
    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you
    afford?

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    Thanks for the info. Will this work for the NameNodes,
    JobTracker, Cloudera Manager server, HBaseMaster, etc.? These will change
    too.

    Cheers,
    Ben


    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny
    wrote:
    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where
    xxx is the new, more descriptive hostname :)


    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny <
    ad...@cloudera.com> wrote:
    In that case, I believe the best option is to actually
    decommission each node, remove it from the cluster via the CM UI, and then
    re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    That sounds like a good start, but what if I want the new
    hostnames to be reflected everywhere and in CM too. The new hostnames will
    have a better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben


    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny
    wrote:
    Oh, I apologize, the process I exercised in the past was
    to change the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to
    change the hostnames without losing the association of hosts->services in
    CM, try the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen****************t
    on each of the machines with a hostname that is changing, to have
    CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting
    the old hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin********
    ****.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin*********
    ***.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Thanks,
    Darren

    --
    Thanks,
    Darren
  • Benjamin Kim at Mar 18, 2013 at 10:00 pm
    Darren,

    The name service is in the Hive metadata, and I will not be changing the
    name service name. It will remain the same. We actually don't care about
    the Hive tables or the Hive data. This data is just test data. It will be
    cleared out eventually.

    Thanks,
    Ben
    On Monday, March 18, 2013 2:56:01 PM UTC-7, Darren Lo wrote:

    If you are using HA, did you run the Hive update namenode locations
    command? It's a Hive service command that helps you run the hive --service
    metatool command, which will change Hive's metastore to reference namenodes
    by nameservice instead of by hostname. The wizard does not run it
    automatically, but it does prompt you to manually take a backup of the hive
    metastore db and then run this update namenode locations command.

    You might be able to keep your HDFS data for Hive and hive schema by using
    this command (or manually playing around with hive --service metatool, but
    the command is easier). Just be sure to pick the same nameservice in your
    new HDFS setup. Worst case scenario you can try running through everything
    and if it doesn't work, then you can wipe your hive warehouse in HDFS and
    your hive schema at the end.

    Either way probably a good idea to back up your hive metastore db before
    you do your re-install.

    Thanks,
    Darren


    On Mon, Mar 18, 2013 at 2:18 PM, Benjamin Kim <bbui...@gmail.com<javascript:>
    wrote:
    Darren,

    We are using HA so dropping it would be prudent. Also, should I delete
    the HDFS data or DROP the Hive tables before?

    Thanks,
    Ben

    On Monday, March 18, 2013 2:07:49 PM UTC-7, Darren Lo wrote:

    I think the hive metastore DB can store host names of namenodes (unless
    you're using HA), so you're probably better off dropping it.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 2:06 PM, Philip Langdale wrote:

    Hi Ben,

    That's all fine, but there's no particular reason to drop the metastore
    DB - it shouldn't contain any information that's
    going to be invalidated by the host shuffle. I would just leave it be.

    --phil

    On 18 March 2013 13:58, Benjamin Kim wrote:

    Phil,

    It looks like I got the green light to rebuild the cluster since there
    is nothing critical on there. Can you see if this sounds like a good plan?

    In Cloudera Manager:
    1. Stop all cluster services
    2. Delete all services from the cluster
    3. Delete all hosts

    In Embedded PostgreSQL:
    1. Drop the Hive metastore but keep Hue and Oozie DB's

    Back in Cloudera Manager:
    1. Add all the new hosts
    2. Assign the proper roles
    3. Recreate the Hive metastore DB
    4. Apply service configurations

    Thanks,
    Ben

    On Wednesday, March 13, 2013 2:28:44 PM UTC-7, Philip Langdale wrote:

    Yep.

    --phil

    On 13 March 2013 14:14, Benjamin Kim wrote:

    Phil,

    It looks like that's the way to go.

    On each server in the cluster, we would:
    1. Stop all cluster services in Cloudera Manager
    2. Edit /etc/default/cloudera-scm-**agen**t and add the old
    identity hostname "--host-id=sml-<rolename>.exam****ple.com<http://example.com>"
    to the variable CMF_AGENT_ARGS.
    3. Change the hostnames of all the cluster nodes in DNS
    4. Restart all the cluster nodes
    5. Start all cluster services in Cloudera Manager

    Does this sound right?

    Thanks,
    Ben
    On Wednesday, March 13, 2013 1:21:57 PM UTC-7, Philip Langdale wrote:

    Hi Ben,

    No, it'll work - not all of that message is applicable in your
    situation, so let me try and clarify.

    Let's take your example host.

    Old hostname: sml-nn1.example.com
    New hostname: lrg-nn1.example.com

    If you, today, add a --host_id=sml-nn1.example.com to your agent
    command args (as described in the link)
    then everything will continue to work after the DNS names change.
    (Although note that you should really
    restart all your hosts and services due to how these things get
    cached in various places)


    --phil

    On 13 March 2013 10:32, Benjamin Kim wrote:

    Phil,

    Unfortunately, the hostname that will change is the first part.
    Right now, it's like sml-nn1.example.com. It will change to
    lrg-nn1.example.com for NameNode1. Plus, infrastructure will be
    doing the change and not us. They will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my opinion.
    What do you think?

    If we do have to recreate, what would be the best way of backing
    up its current state and restoring it back to its original state? Or do we
    have to at all?

    Thanks,
    Ben


    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale
    wrote:
    Hi Ben,

    See my response here. This is how you can change your hostnames
    without having to reassign roles or anything like that.

    https://groups.google.com/a/**cl******oudera.org/d/msg/scm-users/
    **m2U******9m4BfH0w/lGoq_UvOs-oJ<https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ>

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:

    Since the cluster is relatively new, the option to recreate is
    there.

    But, if I were to go down the route of decomm/re-add, would this
    work for the master nodes such as the NameNodes, JobTracker, HBaseMaster,
    Zookeeper, Hue, CM, etc.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny wrote:

    What about in terms of acceptable downtime? Basically, what is
    your appetite to re-create the cluster?

    If you want to keep the cluster in its current state and
    minimize downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster
    setup again (stepping through the add/remove node wizards, re-assigning
    role-to-host mapping), then you also have the option to decommission the
    hosts and then add them again.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore,
    Hive-Server2, and HBaseThrift plus the Hive and HBase clients using the
    embedded PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and
    part of the Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode,
    and part of the Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode,
    and part of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and
    HRegionServers plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny
    wrote:
    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you
    afford?

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    Thanks for the info. Will this work for the NameNodes,
    JobTracker, Cloudera Manager server, HBaseMaster, etc.? These will change
    too.

    Cheers,
    Ben


    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny
    wrote:
    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where
    xxx is the new, more descriptive hostname :)


    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny <
    ad...@cloudera.com> wrote:
    In that case, I believe the best option is to actually
    decommission each node, remove it from the cluster via the CM UI, and then
    re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    That sounds like a good start, but what if I want the new
    hostnames to be reflected everywhere and in CM too. The new hostnames will
    have a better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben


    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam
    Smieszny wrote:
    Oh, I apologize, the process I exercised in the past was
    to change the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to
    change the hostnames without losing the association of hosts->services in
    CM, try the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen****************t
    on each of the machines with a hostname that is changing, to have
    CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting
    the old hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin*******
    *****.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin********
    ****.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**********
    .com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Thanks,
    Darren

    --
    Thanks,
    Darren
  • Benjamin Kim at Mar 18, 2013 at 10:20 pm
    Darren,

    I would like to ask if deleting the HDFS service will delete all the HDFS
    data anyways? What will happen if I install the HDFS service using the old
    HDFS directories and data that already exist on the disks? Will HDFS
    automatically see it?

    Thanks,
    Ben
    On Monday, March 18, 2013 2:56:01 PM UTC-7, Darren Lo wrote:

    If you are using HA, did you run the Hive update namenode locations
    command? It's a Hive service command that helps you run the hive --service
    metatool command, which will change Hive's metastore to reference namenodes
    by nameservice instead of by hostname. The wizard does not run it
    automatically, but it does prompt you to manually take a backup of the hive
    metastore db and then run this update namenode locations command.

    You might be able to keep your HDFS data for Hive and hive schema by using
    this command (or manually playing around with hive --service metatool, but
    the command is easier). Just be sure to pick the same nameservice in your
    new HDFS setup. Worst case scenario you can try running through everything
    and if it doesn't work, then you can wipe your hive warehouse in HDFS and
    your hive schema at the end.

    Either way probably a good idea to back up your hive metastore db before
    you do your re-install.

    Thanks,
    Darren


    On Mon, Mar 18, 2013 at 2:18 PM, Benjamin Kim <bbui...@gmail.com<javascript:>
    wrote:
    Darren,

    We are using HA so dropping it would be prudent. Also, should I delete
    the HDFS data or DROP the Hive tables before?

    Thanks,
    Ben

    On Monday, March 18, 2013 2:07:49 PM UTC-7, Darren Lo wrote:

    I think the hive metastore DB can store host names of namenodes (unless
    you're using HA), so you're probably better off dropping it.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 2:06 PM, Philip Langdale wrote:

    Hi Ben,

    That's all fine, but there's no particular reason to drop the metastore
    DB - it shouldn't contain any information that's
    going to be invalidated by the host shuffle. I would just leave it be.

    --phil

    On 18 March 2013 13:58, Benjamin Kim wrote:

    Phil,

    It looks like I got the green light to rebuild the cluster since there
    is nothing critical on there. Can you see if this sounds like a good plan?

    In Cloudera Manager:
    1. Stop all cluster services
    2. Delete all services from the cluster
    3. Delete all hosts

    In Embedded PostgreSQL:
    1. Drop the Hive metastore but keep Hue and Oozie DB's

    Back in Cloudera Manager:
    1. Add all the new hosts
    2. Assign the proper roles
    3. Recreate the Hive metastore DB
    4. Apply service configurations

    Thanks,
    Ben

    On Wednesday, March 13, 2013 2:28:44 PM UTC-7, Philip Langdale wrote:

    Yep.

    --phil

    On 13 March 2013 14:14, Benjamin Kim wrote:

    Phil,

    It looks like that's the way to go.

    On each server in the cluster, we would:
    1. Stop all cluster services in Cloudera Manager
    2. Edit /etc/default/cloudera-scm-**agen**t and add the old
    identity hostname "--host-id=sml-<rolename>.exam****ple.com<http://example.com>"
    to the variable CMF_AGENT_ARGS.
    3. Change the hostnames of all the cluster nodes in DNS
    4. Restart all the cluster nodes
    5. Start all cluster services in Cloudera Manager

    Does this sound right?

    Thanks,
    Ben
    On Wednesday, March 13, 2013 1:21:57 PM UTC-7, Philip Langdale wrote:

    Hi Ben,

    No, it'll work - not all of that message is applicable in your
    situation, so let me try and clarify.

    Let's take your example host.

    Old hostname: sml-nn1.example.com
    New hostname: lrg-nn1.example.com

    If you, today, add a --host_id=sml-nn1.example.com to your agent
    command args (as described in the link)
    then everything will continue to work after the DNS names change.
    (Although note that you should really
    restart all your hosts and services due to how these things get
    cached in various places)


    --phil

    On 13 March 2013 10:32, Benjamin Kim wrote:

    Phil,

    Unfortunately, the hostname that will change is the first part.
    Right now, it's like sml-nn1.example.com. It will change to
    lrg-nn1.example.com for NameNode1. Plus, infrastructure will be
    doing the change and not us. They will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my opinion.
    What do you think?

    If we do have to recreate, what would be the best way of backing
    up its current state and restoring it back to its original state? Or do we
    have to at all?

    Thanks,
    Ben


    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale
    wrote:
    Hi Ben,

    See my response here. This is how you can change your hostnames
    without having to reassign roles or anything like that.

    https://groups.google.com/a/**cl******oudera.org/d/msg/scm-users/
    **m2U******9m4BfH0w/lGoq_UvOs-oJ<https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ>

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:

    Since the cluster is relatively new, the option to recreate is
    there.

    But, if I were to go down the route of decomm/re-add, would this
    work for the master nodes such as the NameNodes, JobTracker, HBaseMaster,
    Zookeeper, Hue, CM, etc.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny wrote:

    What about in terms of acceptable downtime? Basically, what is
    your appetite to re-create the cluster?

    If you want to keep the cluster in its current state and
    minimize downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster
    setup again (stepping through the add/remove node wizards, re-assigning
    role-to-host mapping), then you also have the option to decommission the
    hosts and then add them again.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore,
    Hive-Server2, and HBaseThrift plus the Hive and HBase clients using the
    embedded PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and
    part of the Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode,
    and part of the Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode,
    and part of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and
    HRegionServers plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny
    wrote:
    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you
    afford?

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    Thanks for the info. Will this work for the NameNodes,
    JobTracker, Cloudera Manager server, HBaseMaster, etc.? These will change
    too.

    Cheers,
    Ben


    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny
    wrote:
    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where
    xxx is the new, more descriptive hostname :)


    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny <
    ad...@cloudera.com> wrote:
    In that case, I believe the best option is to actually
    decommission each node, remove it from the cluster via the CM UI, and then
    re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    That sounds like a good start, but what if I want the new
    hostnames to be reflected everywhere and in CM too. The new hostnames will
    have a better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben


    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam
    Smieszny wrote:
    Oh, I apologize, the process I exercised in the past was
    to change the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to
    change the hostnames without losing the association of hosts->services in
    CM, try the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen****************t
    on each of the machines with a hostname that is changing, to have
    CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting
    the old hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin*******
    *****.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin********
    ****.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**********
    .com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Thanks,
    Darren

    --
    Thanks,
    Darren
  • Darren Lo at Mar 18, 2013 at 10:22 pm
    Hi Ben,

    In general, deleting CM services does not delete the underlying data. Your
    HDFS data will remain in the data dirs on your machines when you delete
    your HDFS service. When you set up your new cluster, specify the same data
    dirs and you'll see your data re-appear.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 3:20 PM, Benjamin Kim wrote:

    Darren,

    I would like to ask if deleting the HDFS service will delete all the HDFS
    data anyways? What will happen if I install the HDFS service using the old
    HDFS directories and data that already exist on the disks? Will HDFS
    automatically see it?

    Thanks,
    Ben

    On Monday, March 18, 2013 2:56:01 PM UTC-7, Darren Lo wrote:

    If you are using HA, did you run the Hive update namenode locations
    command? It's a Hive service command that helps you run the hive --service
    metatool command, which will change Hive's metastore to reference namenodes
    by nameservice instead of by hostname. The wizard does not run it
    automatically, but it does prompt you to manually take a backup of the hive
    metastore db and then run this update namenode locations command.

    You might be able to keep your HDFS data for Hive and hive schema by
    using this command (or manually playing around with hive --service
    metatool, but the command is easier). Just be sure to pick the same
    nameservice in your new HDFS setup. Worst case scenario you can try running
    through everything and if it doesn't work, then you can wipe your hive
    warehouse in HDFS and your hive schema at the end.

    Either way probably a good idea to back up your hive metastore db before
    you do your re-install.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 2:18 PM, Benjamin Kim wrote:

    Darren,

    We are using HA so dropping it would be prudent. Also, should I delete
    the HDFS data or DROP the Hive tables before?

    Thanks,
    Ben

    On Monday, March 18, 2013 2:07:49 PM UTC-7, Darren Lo wrote:

    I think the hive metastore DB can store host names of namenodes (unless
    you're using HA), so you're probably better off dropping it.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 2:06 PM, Philip Langdale wrote:

    Hi Ben,

    That's all fine, but there's no particular reason to drop the
    metastore DB - it shouldn't contain any information that's
    going to be invalidated by the host shuffle. I would just leave it be.

    --phil

    On 18 March 2013 13:58, Benjamin Kim wrote:

    Phil,

    It looks like I got the green light to rebuild the cluster since
    there is nothing critical on there. Can you see if this sounds like a good
    plan?

    In Cloudera Manager:
    1. Stop all cluster services
    2. Delete all services from the cluster
    3. Delete all hosts

    In Embedded PostgreSQL:
    1. Drop the Hive metastore but keep Hue and Oozie DB's

    Back in Cloudera Manager:
    1. Add all the new hosts
    2. Assign the proper roles
    3. Recreate the Hive metastore DB
    4. Apply service configurations

    Thanks,
    Ben

    On Wednesday, March 13, 2013 2:28:44 PM UTC-7, Philip Langdale wrote:

    Yep.

    --phil

    On 13 March 2013 14:14, Benjamin Kim wrote:

    Phil,

    It looks like that's the way to go.

    On each server in the cluster, we would:
    1. Stop all cluster services in Cloudera Manager
    2. Edit /etc/default/cloudera-scm-**agen****t and add the old
    identity hostname "--host-id=sml-<rolename>.exam******ple.com<http://example.com>"
    to the variable CMF_AGENT_ARGS.
    3. Change the hostnames of all the cluster nodes in DNS
    4. Restart all the cluster nodes
    5. Start all cluster services in Cloudera Manager

    Does this sound right?

    Thanks,
    Ben

    On Wednesday, March 13, 2013 1:21:57 PM UTC-7, Philip Langdale
    wrote:
    Hi Ben,

    No, it'll work - not all of that message is applicable in your
    situation, so let me try and clarify.

    Let's take your example host.

    Old hostname: sml-nn1.example.com
    New hostname: lrg-nn1.example.com

    If you, today, add a --host_id=sml-nn1.example.com to your agent
    command args (as described in the link)
    then everything will continue to work after the DNS names change.
    (Although note that you should really
    restart all your hosts and services due to how these things get
    cached in various places)


    --phil

    On 13 March 2013 10:32, Benjamin Kim wrote:

    Phil,

    Unfortunately, the hostname that will change is the first part.
    Right now, it's like sml-nn1.example.com. It will change to
    lrg-nn1.example.com for NameNode1. Plus, infrastructure will be
    doing the change and not us. They will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my opinion.
    What do you think?

    If we do have to recreate, what would be the best way of backing
    up its current state and restoring it back to its original state? Or do we
    have to at all?

    Thanks,
    Ben


    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale
    wrote:
    Hi Ben,

    See my response here. This is how you can change your hostnames
    without having to reassign roles or anything like that.

    https://groups.google.com/a/**cl********
    oudera.org/d/msg/scm-users/**m2U********9m4BfH0w/lGoq_UvOs-oJ<https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ>

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:

    Since the cluster is relatively new, the option to recreate is
    there.

    But, if I were to go down the route of decomm/re-add, would
    this work for the master nodes such as the NameNodes, JobTracker,
    HBaseMaster, Zookeeper, Hue, CM, etc.

    Thanks,
    Ben


    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny
    wrote:
    What about in terms of acceptable downtime? Basically, what is
    your appetite to re-create the cluster?

    If you want to keep the cluster in its current state and
    minimize downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster
    setup again (stepping through the add/remove node wizards, re-assigning
    role-to-host mapping), then you also have the option to decommission the
    hosts and then add them again.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore,
    Hive-Server2, and HBaseThrift plus the Hive and HBase clients using the
    embedded PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and
    part of the Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode,
    and part of the Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster,
    JournalNode, and part of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and
    HRegionServers plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny
    wrote:
    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you
    afford?

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    Thanks for the info. Will this work for the NameNodes,
    JobTracker, Cloudera Manager server, HBaseMaster, etc.? These will change
    too.

    Cheers,
    Ben


    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny
    wrote:
    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where
    xxx is the new, more descriptive hostname :)


    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny <
    ad...@cloudera.com> wrote:
    In that case, I believe the best option is to actually
    decommission each node, remove it from the cluster via the CM UI, and then
    re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    That sounds like a good start, but what if I want the
    new hostnames to be reflected everywhere and in CM too. The new hostnames
    will have a better prefix to reflect company policies; so, we want to see
    that.

    Thanks,
    Ben


    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam
    Smieszny wrote:
    Oh, I apologize, the process I exercised in the past
    was to change the IP address when the hostname stayed constant. That is
    easy.

    Per previous threads on the mailing list, in order to
    change the hostnames without losing the association of hosts->services in
    CM, try the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen*****************
    *t on each of the machines with a hostname that is
    changing, to have CMF_AGENT_ARGS="--host_id xxx" where xxx is the old
    hostname.
    4) Restart cloudera-scm-agent on each machine you
    changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting
    the old hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin******
    ********.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin*******
    *******.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin*********
    ***.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**********
    .com/in/adamsmieszny <http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Thanks,
    Darren

    --
    Thanks,
    Darren

    --
    Thanks,
    Darren
  • Benjamin Kim at Mar 18, 2013 at 11:04 pm
    Darren,

    So; if I leave the name service name, Hive metastore, Hive warehouse
    directory, and the HDFS data directories; everything will be back to how it
    was before? The permissions will be same?

    Another consideration I forgot to mention is that we are using CDH4.1.3,
    and we will be rebuilding the cluster using CDH4.2.

    Thanks,
    Ben
    On Monday, March 18, 2013 3:22:29 PM UTC-7, Darren Lo wrote:

    Hi Ben,

    In general, deleting CM services does not delete the underlying data. Your
    HDFS data will remain in the data dirs on your machines when you delete
    your HDFS service. When you set up your new cluster, specify the same data
    dirs and you'll see your data re-appear.

    Thanks,
    Darren


    On Mon, Mar 18, 2013 at 3:20 PM, Benjamin Kim <bbui...@gmail.com<javascript:>
    wrote:
    Darren,

    I would like to ask if deleting the HDFS service will delete all the HDFS
    data anyways? What will happen if I install the HDFS service using the old
    HDFS directories and data that already exist on the disks? Will HDFS
    automatically see it?

    Thanks,
    Ben

    On Monday, March 18, 2013 2:56:01 PM UTC-7, Darren Lo wrote:

    If you are using HA, did you run the Hive update namenode locations
    command? It's a Hive service command that helps you run the hive --service
    metatool command, which will change Hive's metastore to reference namenodes
    by nameservice instead of by hostname. The wizard does not run it
    automatically, but it does prompt you to manually take a backup of the hive
    metastore db and then run this update namenode locations command.

    You might be able to keep your HDFS data for Hive and hive schema by
    using this command (or manually playing around with hive --service
    metatool, but the command is easier). Just be sure to pick the same
    nameservice in your new HDFS setup. Worst case scenario you can try running
    through everything and if it doesn't work, then you can wipe your hive
    warehouse in HDFS and your hive schema at the end.

    Either way probably a good idea to back up your hive metastore db before
    you do your re-install.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 2:18 PM, Benjamin Kim wrote:

    Darren,

    We are using HA so dropping it would be prudent. Also, should I delete
    the HDFS data or DROP the Hive tables before?

    Thanks,
    Ben

    On Monday, March 18, 2013 2:07:49 PM UTC-7, Darren Lo wrote:

    I think the hive metastore DB can store host names of namenodes
    (unless you're using HA), so you're probably better off dropping it.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 2:06 PM, Philip Langdale wrote:

    Hi Ben,

    That's all fine, but there's no particular reason to drop the
    metastore DB - it shouldn't contain any information that's
    going to be invalidated by the host shuffle. I would just leave it be.

    --phil

    On 18 March 2013 13:58, Benjamin Kim wrote:

    Phil,

    It looks like I got the green light to rebuild the cluster since
    there is nothing critical on there. Can you see if this sounds like a good
    plan?

    In Cloudera Manager:
    1. Stop all cluster services
    2. Delete all services from the cluster
    3. Delete all hosts

    In Embedded PostgreSQL:
    1. Drop the Hive metastore but keep Hue and Oozie DB's

    Back in Cloudera Manager:
    1. Add all the new hosts
    2. Assign the proper roles
    3. Recreate the Hive metastore DB
    4. Apply service configurations

    Thanks,
    Ben

    On Wednesday, March 13, 2013 2:28:44 PM UTC-7, Philip Langdale wrote:

    Yep.

    --phil

    On 13 March 2013 14:14, Benjamin Kim wrote:

    Phil,

    It looks like that's the way to go.

    On each server in the cluster, we would:
    1. Stop all cluster services in Cloudera Manager
    2. Edit /etc/default/cloudera-scm-**agen****t and add the old
    identity hostname "--host-id=sml-<rolename>.exam******ple.com<http://example.com>"
    to the variable CMF_AGENT_ARGS.
    3. Change the hostnames of all the cluster nodes in DNS
    4. Restart all the cluster nodes
    5. Start all cluster services in Cloudera Manager

    Does this sound right?

    Thanks,
    Ben

    On Wednesday, March 13, 2013 1:21:57 PM UTC-7, Philip Langdale
    wrote:
    Hi Ben,

    No, it'll work - not all of that message is applicable in your
    situation, so let me try and clarify.

    Let's take your example host.

    Old hostname: sml-nn1.example.com
    New hostname: lrg-nn1.example.com

    If you, today, add a --host_id=sml-nn1.example.com to your
    agent command args (as described in the link)
    then everything will continue to work after the DNS names change.
    (Although note that you should really
    restart all your hosts and services due to how these things get
    cached in various places)


    --phil

    On 13 March 2013 10:32, Benjamin Kim wrote:

    Phil,

    Unfortunately, the hostname that will change is the first part.
    Right now, it's like sml-nn1.example.com. It will change to
    lrg-nn1.example.com for NameNode1. Plus, infrastructure will be
    doing the change and not us. They will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my
    opinion. What do you think?

    If we do have to recreate, what would be the best way of backing
    up its current state and restoring it back to its original state? Or do we
    have to at all?

    Thanks,
    Ben


    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale
    wrote:
    Hi Ben,

    See my response here. This is how you can change your hostnames
    without having to reassign roles or anything like that.

    https://groups.google.com/a/**cl********
    oudera.org/d/msg/scm-users/**m2U********9m4BfH0w/lGoq_UvOs-oJ<https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ>

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:

    Since the cluster is relatively new, the option to recreate is
    there.

    But, if I were to go down the route of decomm/re-add, would
    this work for the master nodes such as the NameNodes, JobTracker,
    HBaseMaster, Zookeeper, Hue, CM, etc.

    Thanks,
    Ben


    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny
    wrote:
    What about in terms of acceptable downtime? Basically, what
    is your appetite to re-create the cluster?

    If you want to keep the cluster in its current state and
    minimize downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster
    setup again (stepping through the add/remove node wizards, re-assigning
    role-to-host mapping), then you also have the option to decommission the
    hosts and then add them again.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore,
    Hive-Server2, and HBaseThrift plus the Hive and HBase clients using the
    embedded PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and
    part of the Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode,
    and part of the Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster,
    JournalNode, and part of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and
    HRegionServers plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny
    wrote:
    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you
    afford?

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    Thanks for the info. Will this work for the NameNodes,
    JobTracker, Cloudera Manager server, HBaseMaster, etc.? These will change
    too.

    Cheers,
    Ben


    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny
    wrote:
    Or you could set the CMF_AGENT_ARGS="--host_id xxx"
    where xxx is the new, more descriptive hostname :)


    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny <
    ad...@cloudera.com> wrote:
    In that case, I believe the best option is to actually
    decommission each node, remove it from the cluster via the CM UI, and then
    re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    That sounds like a good start, but what if I want the
    new hostnames to be reflected everywhere and in CM too. The new hostnames
    will have a better prefix to reflect company policies; so, we want to see
    that.

    Thanks,
    Ben


    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam
    Smieszny wrote:
    Oh, I apologize, the process I exercised in the past
    was to change the IP address when the hostname stayed constant. That is
    easy.

    Per previous threads on the mailing list, in order to
    change the hostnames without losing the association of hosts->services in
    CM, try the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen****************
    **t on each of the machines with a hostname that is
    changing, to have CMF_AGENT_ARGS="--host_id xxx" where xxx is the old
    hostname.
    4) Restart cloudera-scm-agent on each machine you
    changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents
    reporting the old hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin*****
    *********.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin******
    ********.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin********
    ****.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin**********
    .com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Thanks,
    Darren

    --
    Thanks,
    Darren

    --
    Thanks,
    Darren
  • Darren Lo at Mar 18, 2013 at 11:15 pm
    Hi Ben,

    With the same CDH version that should work. Changing CDH versions will
    probably not work, you should go through a CDH upgrade.

    If you want to preserve your data, you should rebuild the cluster with the
    same CDH version and then do a CDH upgrade (which includes a manual upgrade
    of the Hive metastore db, follow the guide here:
    https://ccp.cloudera.com/display/FREE45DOC/Upgrading+to+the+Latest+Version+of+CDH4+in+a+Cloudera+Managed+Deployment
    ).

    If it takes you more than a couple hours to re-create the data, it's
    probably worth going the upgrade route.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 4:04 PM, Benjamin Kim wrote:

    Darren,

    So; if I leave the name service name, Hive metastore, Hive warehouse
    directory, and the HDFS data directories; everything will be back to how it
    was before? The permissions will be same?

    Another consideration I forgot to mention is that we are using CDH4.1.3,
    and we will be rebuilding the cluster using CDH4.2.

    Thanks,
    Ben
    On Monday, March 18, 2013 3:22:29 PM UTC-7, Darren Lo wrote:

    Hi Ben,

    In general, deleting CM services does not delete the underlying data.
    Your HDFS data will remain in the data dirs on your machines when you
    delete your HDFS service. When you set up your new cluster, specify the
    same data dirs and you'll see your data re-appear.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 3:20 PM, Benjamin Kim wrote:

    Darren,

    I would like to ask if deleting the HDFS service will delete all the
    HDFS data anyways? What will happen if I install the HDFS service using the
    old HDFS directories and data that already exist on the disks? Will HDFS
    automatically see it?

    Thanks,
    Ben

    On Monday, March 18, 2013 2:56:01 PM UTC-7, Darren Lo wrote:

    If you are using HA, did you run the Hive update namenode locations
    command? It's a Hive service command that helps you run the hive --service
    metatool command, which will change Hive's metastore to reference namenodes
    by nameservice instead of by hostname. The wizard does not run it
    automatically, but it does prompt you to manually take a backup of the hive
    metastore db and then run this update namenode locations command.

    You might be able to keep your HDFS data for Hive and hive schema by
    using this command (or manually playing around with hive --service
    metatool, but the command is easier). Just be sure to pick the same
    nameservice in your new HDFS setup. Worst case scenario you can try running
    through everything and if it doesn't work, then you can wipe your hive
    warehouse in HDFS and your hive schema at the end.

    Either way probably a good idea to back up your hive metastore db
    before you do your re-install.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 2:18 PM, Benjamin Kim wrote:

    Darren,

    We are using HA so dropping it would be prudent. Also, should I delete
    the HDFS data or DROP the Hive tables before?

    Thanks,
    Ben

    On Monday, March 18, 2013 2:07:49 PM UTC-7, Darren Lo wrote:

    I think the hive metastore DB can store host names of namenodes
    (unless you're using HA), so you're probably better off dropping it.

    Thanks,
    Darren


    On Mon, Mar 18, 2013 at 2:06 PM, Philip Langdale <phi...@cloudera.com
    wrote:
    Hi Ben,

    That's all fine, but there's no particular reason to drop the
    metastore DB - it shouldn't contain any information that's
    going to be invalidated by the host shuffle. I would just leave it
    be.

    --phil

    On 18 March 2013 13:58, Benjamin Kim wrote:

    Phil,

    It looks like I got the green light to rebuild the cluster since
    there is nothing critical on there. Can you see if this sounds like a good
    plan?

    In Cloudera Manager:
    1. Stop all cluster services
    2. Delete all services from the cluster
    3. Delete all hosts

    In Embedded PostgreSQL:
    1. Drop the Hive metastore but keep Hue and Oozie DB's

    Back in Cloudera Manager:
    1. Add all the new hosts
    2. Assign the proper roles
    3. Recreate the Hive metastore DB
    4. Apply service configurations

    Thanks,
    Ben


    On Wednesday, March 13, 2013 2:28:44 PM UTC-7, Philip Langdale
    wrote:
    Yep.

    --phil

    On 13 March 2013 14:14, Benjamin Kim wrote:

    Phil,

    It looks like that's the way to go.

    On each server in the cluster, we would:
    1. Stop all cluster services in Cloudera Manager
    2. Edit /etc/default/cloudera-scm-**agen******t and add the old
    identity hostname "--host-id=sml-<rolename>.exam********ple.com<http://example.com>"
    to the variable CMF_AGENT_ARGS.
    3. Change the hostnames of all the cluster nodes in DNS
    4. Restart all the cluster nodes
    5. Start all cluster services in Cloudera Manager

    Does this sound right?

    Thanks,
    Ben

    On Wednesday, March 13, 2013 1:21:57 PM UTC-7, Philip Langdale
    wrote:
    Hi Ben,

    No, it'll work - not all of that message is applicable in your
    situation, so let me try and clarify.

    Let's take your example host.

    Old hostname: sml-nn1.example.com
    New hostname: lrg-nn1.example.com

    If you, today, add a --host_id=sml-nn1.example.com to your
    agent command args (as described in the link)
    then everything will continue to work after the DNS names
    change. (Although note that you should really
    restart all your hosts and services due to how these things get
    cached in various places)


    --phil

    On 13 March 2013 10:32, Benjamin Kim wrote:

    Phil,

    Unfortunately, the hostname that will change is the first part.
    Right now, it's like sml-nn1.example.com. It will change to
    lrg-nn1.example.com for NameNode1. Plus, infrastructure will
    be doing the change and not us. They will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my
    opinion. What do you think?

    If we do have to recreate, what would be the best way of
    backing up its current state and restoring it back to its original state?
    Or do we have to at all?

    Thanks,
    Ben


    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale
    wrote:
    Hi Ben,

    See my response here. This is how you can change your
    hostnames without having to reassign roles or anything like that.

    https://groups.google.com/a/**cl**********
    oudera.org/d/msg/scm-users/**m2U**********
    9m4BfH0w/lGoq_UvOs-oJ<https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ>

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:

    Since the cluster is relatively new, the option to recreate
    is there.

    But, if I were to go down the route of decomm/re-add, would
    this work for the master nodes such as the NameNodes, JobTracker,
    HBaseMaster, Zookeeper, Hue, CM, etc.

    Thanks,
    Ben


    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny
    wrote:
    What about in terms of acceptable downtime? Basically, what
    is your appetite to re-create the cluster?

    If you want to keep the cluster in its current state and
    minimize downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster
    setup again (stepping through the add/remove node wizards, re-assigning
    role-to-host mapping), then you also have the option to decommission the
    hosts and then add them again.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore,
    Hive-Server2, and HBaseThrift plus the Hive and HBase clients using the
    embedded PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and
    part of the Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode,
    and part of the Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster,
    JournalNode, and part of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and
    HRegionServers plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny
    wrote:
    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you
    afford?

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    Thanks for the info. Will this work for the NameNodes,
    JobTracker, Cloudera Manager server, HBaseMaster, etc.? These will change
    too.

    Cheers,
    Ben


    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam
    Smieszny wrote:
    Or you could set the CMF_AGENT_ARGS="--host_id xxx"
    where xxx is the new, more descriptive hostname :)


    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny <
    ad...@cloudera.com> wrote:
    In that case, I believe the best option is to
    actually decommission each node, remove it from the cluster via the CM UI,
    and then re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a
    time.

    Thanks,
    Adam


    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim <
    bbui...@gmail.com> wrote:
    Adam,

    That sounds like a good start, but what if I want the
    new hostnames to be reflected everywhere and in CM too. The new hostnames
    will have a better prefix to reflect company policies; so, we want to see
    that.

    Thanks,
    Ben


    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam
    Smieszny wrote:
    Oh, I apologize, the process I exercised in the past
    was to change the IP address when the hostname stayed constant. That is
    easy.

    Per previous threads on the mailing list, in order to
    change the hostnames without losing the association of hosts->services in
    CM, try the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agen***************
    *****t on each of the machines with a hostname that
    is changing, to have CMF_AGENT_ARGS="--host_id xxx" where xxx is the old
    hostname.
    4) Restart cloudera-scm-agent on each machine you
    changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents
    reporting the old hostnames so you don't have to change anything else.

    Thanks,
    Adam

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin****
    ************.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin*****
    ***********.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin*******
    *******.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.**linkedin*********
    ***.com/in/adamsmieszny<http://www.linkedin.com/in/adamsmieszny>
    917.830.4156

    --
    Thanks,
    Darren

    --
    Thanks,
    Darren

    --
    Thanks,
    Darren

    --
    Thanks,
    Darren
  • Benjamin Kim at Mar 19, 2013 at 12:08 am
    Darren,

    It looks like we will upgrade to CDH4.2 before the rebuild. We are ok.

    Thanks,
    Ben
    On Mar 18, 2013, at 4:15 PM, Darren Lo wrote:

    Hi Ben,

    With the same CDH version that should work. Changing CDH versions will probably not work, you should go through a CDH upgrade.

    If you want to preserve your data, you should rebuild the cluster with the same CDH version and then do a CDH upgrade (which includes a manual upgrade of the Hive metastore db, follow the guide here: https://ccp.cloudera.com/display/FREE45DOC/Upgrading+to+the+Latest+Version+of+CDH4+in+a+Cloudera+Managed+Deployment).

    If it takes you more than a couple hours to re-create the data, it's probably worth going the upgrade route.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 4:04 PM, Benjamin Kim wrote:
    Darren,

    So; if I leave the name service name, Hive metastore, Hive warehouse directory, and the HDFS data directories; everything will be back to how it was before? The permissions will be same?

    Another consideration I forgot to mention is that we are using CDH4.1.3, and we will be rebuilding the cluster using CDH4.2.

    Thanks,
    Ben
    On Monday, March 18, 2013 3:22:29 PM UTC-7, Darren Lo wrote:
    Hi Ben,

    In general, deleting CM services does not delete the underlying data. Your HDFS data will remain in the data dirs on your machines when you delete your HDFS service. When you set up your new cluster, specify the same data dirs and you'll see your data re-appear.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 3:20 PM, Benjamin Kim wrote:
    Darren,

    I would like to ask if deleting the HDFS service will delete all the HDFS data anyways? What will happen if I install the HDFS service using the old HDFS directories and data that already exist on the disks? Will HDFS automatically see it?

    Thanks,
    Ben

    On Monday, March 18, 2013 2:56:01 PM UTC-7, Darren Lo wrote:
    If you are using HA, did you run the Hive update namenode locations command? It's a Hive service command that helps you run the hive --service metatool command, which will change Hive's metastore to reference namenodes by nameservice instead of by hostname. The wizard does not run it automatically, but it does prompt you to manually take a backup of the hive metastore db and then run this update namenode locations command.

    You might be able to keep your HDFS data for Hive and hive schema by using this command (or manually playing around with hive --service metatool, but the command is easier). Just be sure to pick the same nameservice in your new HDFS setup. Worst case scenario you can try running through everything and if it doesn't work, then you can wipe your hive warehouse in HDFS and your hive schema at the end.

    Either way probably a good idea to back up your hive metastore db before you do your re-install.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 2:18 PM, Benjamin Kim wrote:
    Darren,

    We are using HA so dropping it would be prudent. Also, should I delete the HDFS data or DROP the Hive tables before?

    Thanks,
    Ben

    On Monday, March 18, 2013 2:07:49 PM UTC-7, Darren Lo wrote:
    I think the hive metastore DB can store host names of namenodes (unless you're using HA), so you're probably better off dropping it.

    Thanks,
    Darren

    On Mon, Mar 18, 2013 at 2:06 PM, Philip Langdale wrote:
    Hi Ben,

    That's all fine, but there's no particular reason to drop the metastore DB - it shouldn't contain any information that's
    going to be invalidated by the host shuffle. I would just leave it be.

    --phil

    On 18 March 2013 13:58, Benjamin Kim wrote:
    Phil,

    It looks like I got the green light to rebuild the cluster since there is nothing critical on there. Can you see if this sounds like a good plan?

    In Cloudera Manager:
    1. Stop all cluster services
    2. Delete all services from the cluster
    3. Delete all hosts

    In Embedded PostgreSQL:
    1. Drop the Hive metastore but keep Hue and Oozie DB's

    Back in Cloudera Manager:
    1. Add all the new hosts
    2. Assign the proper roles
    3. Recreate the Hive metastore DB
    4. Apply service configurations

    Thanks,
    Ben

    On Wednesday, March 13, 2013 2:28:44 PM UTC-7, Philip Langdale wrote:
    Yep.

    --phil

    On 13 March 2013 14:14, Benjamin Kim wrote:
    Phil,

    It looks like that's the way to go.

    On each server in the cluster, we would:
    1. Stop all cluster services in Cloudera Manager
    2. Edit /etc/default/cloudera-scm-agent and add the old identity hostname "--host-id=sml-<rolename>.example.com" to the variable CMF_AGENT_ARGS.
    3. Change the hostnames of all the cluster nodes in DNS
    4. Restart all the cluster nodes
    5. Start all cluster services in Cloudera Manager

    Does this sound right?

    Thanks,
    Ben
    On Wednesday, March 13, 2013 1:21:57 PM UTC-7, Philip Langdale wrote:
    Hi Ben,

    No, it'll work - not all of that message is applicable in your situation, so let me try and clarify.

    Let's take your example host.

    Old hostname: sml-nn1.example.com
    New hostname: lrg-nn1.example.com

    If you, today, add a --host_id=sml-nn1.example.com to your agent command args (as described in the link)
    then everything will continue to work after the DNS names change. (Although note that you should really
    restart all your hosts and services due to how these things get cached in various places)


    --phil

    On 13 March 2013 10:32, Benjamin Kim wrote:
    Phil,

    Unfortunately, the hostname that will change is the first part. Right now, it's like sml-nn1.example.com. It will change to lrg-nn1.example.com for NameNode1. Plus, infrastructure will be doing the change and not us. They will do all the DNS hosts stuff.

    It looks like we will have to recreate the cluster in my opinion. What do you think?

    If we do have to recreate, what would be the best way of backing up its current state and restoring it back to its original state? Or do we have to at all?

    Thanks,
    Ben

    On Wednesday, March 13, 2013 9:37:22 AM UTC-7, Philip Langdale wrote:
    Hi Ben,

    See my response here. This is how you can change your hostnames without having to reassign roles or anything like that.

    https://groups.google.com/a/cloudera.org/d/msg/scm-users/m2U9m4BfH0w/lGoq_UvOs-oJ

    --phil

    On 13 March 2013 08:07, Benjamin Kim wrote:
    Since the cluster is relatively new, the option to recreate is there.

    But, if I were to go down the route of decomm/re-add, would this work for the master nodes such as the NameNodes, JobTracker, HBaseMaster, Zookeeper, Hue, CM, etc.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 7:28:09 PM UTC-7, Adam Smieszny wrote:
    What about in terms of acceptable downtime? Basically, what is your appetite to re-create the cluster?

    If you want to keep the cluster in its current state and minimize downtime, I would suggest to use the CMF_AGENT_ARGS method.

    If, on the other hand, you can afford to go through cluster setup again (stepping through the add/remove node wizards, re-assigning role-to-host mapping), then you also have the option to decommission the hosts and then add them again.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:54 PM, Benjamin Kim wrote:
    Adam,

    I forgot some things. Let me reiterate.

    We have:
    - 1 gateway server with CM, Hue, HttpFS, Hive-Metastore, Hive-Server2, and HBaseThrift plus the Hive and HBase clients using the embedded PostgreSQL 8.4
    - 1 master server as the Active NameNode, JournalNode, and part of the Zookeeper Quorum
    - 1 master server as the Passive NameNode, JournalNode, and part of the Zookeeper Quorum
    - 1 master server as the JobTracker, HBaseMaster, JournalNode, and part of the Zookeeper Quorum plus the Impala StateStore
    - 6 slave servers as the DataNodes, TaskTrackers, and HRegionServers plus the Impala Daemons

    The OS on all these boxes are CentOS 6.3.

    Thanks,
    Ben
    On Tuesday, March 12, 2013 4:18:41 PM UTC-7, Adam Smieszny wrote:
    How many nodes are in your cluster?
    How much data do you have, and how much downtime can you afford?

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 7:11 PM, Benjamin Kim wrote:
    Adam,

    Thanks for the info. Will this work for the NameNodes, JobTracker, Cloudera Manager server, HBaseMaster, etc.? These will change too.

    Cheers,
    Ben

    On Tuesday, March 12, 2013 2:55:31 PM UTC-7, Adam Smieszny wrote:
    Or you could set the CMF_AGENT_ARGS="--host_id xxx" where xxx is the new, more descriptive hostname :)

    On Tue, Mar 12, 2013 at 5:54 PM, Adam Smieszny wrote:
    In that case, I believe the best option is to actually decommission each node, remove it from the cluster via the CM UI, and then re-add it with the new hostname.

    Depending on the size of the cluster, do a few at a time.

    Thanks,
    Adam

    On Tue, Mar 12, 2013 at 5:07 PM, Benjamin Kim wrote:
    Adam,

    That sounds like a good start, but what if I want the new hostnames to be reflected everywhere and in CM too. The new hostnames will have a better prefix to reflect company policies; so, we want to see that.

    Thanks,
    Ben

    On Tuesday, March 12, 2013 1:51:50 PM UTC-7, Adam Smieszny wrote:

    Oh, I apologize, the process I exercised in the past was to change the IP address when the hostname stayed constant. That is easy.

    Per previous threads on the mailing list, in order to change the hostnames without losing the association of hosts->services in CM, try the following:
    1) Stop Hadoop services via CM
    2) Edit the hostnames at the DNS or /etc/host level
    3) edit /etc/default/cloudera-scm-agent on each of the machines with a hostname that is changing, to have CMF_AGENT_ARGS="--host_id xxx" where xxx is the old hostname.
    4) Restart cloudera-scm-agent on each machine you changed
    5) Start Hadoop services via CM

    I think this should leave you with the Agents reporting the old hostnames so you don't have to change anything else.

    Thanks,
    Adam


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156



    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156


    --
    Thanks,
    Darren


    --
    Thanks,
    Darren


    --
    Thanks,
    Darren


    --
    Thanks,
    Darren

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedMar 11, '13 at 5:12p
activeMar 19, '13 at 12:08a
posts29
users4
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase