FAQ
Hello,
I have a system of three instances: server X that contains Cloudera
manager, server M - master node and S - slave node. A few days ago, for
some reason, Cloduera Agent on the S server stopped responding. Manager
complains, that there is no heartbeat from the S server. So my question is:
1. What are the solutions to absence of heartbeat: any logs to check? on
which server?

Also, I was considering to deploy CDH cluster to production, but
considering the presence of sensitive data, what are the options when the
whole cluster crashes?
2. How to recover crashed CDH cluster? E.g. nodes do not restore and
replicate anymore. Is there any way to extract data out of such cluster?

Currently I am using CDH3. Tried to upgrade it to CDH4 but it crashed in
the middle of the progress.
3. Has anyone successfully accomplished this at all?

PS. Have you tried to register to Cloudera Forum? Seems that it doesn't
work. Even email confirmations are sent only sometimes.

Thanks!

Gintas

To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.

Search Discussions

  • Adam Smieszny at Aug 30, 2013 at 2:05 pm
    1.You should first look at the
    /var/log/cloudera-scm-agent/cloudera-scm-agent.log on the machine S. It
    should tell you what happened to the Agent process.

    2. If your entire cluster got wiped off the map, and you couldn't access
    any of the machines, then you would lose your data. But, the point of
    Hadoop is that it will replicate your data across multiple machines. If you
    only have 1 slave node, then it can't replicate it to multiple machines.
    But, if you had 3 slave machines, then you could actually lose 2 of the
    machines and still have all of your data available, because data is
    replicated 3 times by default.

    3. Yes, I am working with a number of my clients to upgrade from CDH3 to
    CDH4. You should be able to follow the steps outlined here:
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_6.html

    Hope this helps
    Thanks,
    Adam

    On Fri, Aug 30, 2013 at 9:13 AM, Gintautas Sulskus wrote:

    Hello,
    I have a system of three instances: server X that contains Cloudera
    manager, server M - master node and S - slave node. A few days ago, for
    some reason, Cloduera Agent on the S server stopped responding. Manager
    complains, that there is no heartbeat from the S server. So my question is:
    1. What are the solutions to absence of heartbeat: any logs to check? on
    which server?

    Also, I was considering to deploy CDH cluster to production, but
    considering the presence of sensitive data, what are the options when the
    whole cluster crashes?
    2. How to recover crashed CDH cluster? E.g. nodes do not restore and
    replicate anymore. Is there any way to extract data out of such cluster?

    Currently I am using CDH3. Tried to upgrade it to CDH4 but it crashed in
    the middle of the progress.
    3. Has anyone successfully accomplished this at all?

    PS. Have you tried to register to Cloudera Forum? Seems that it doesn't
    work. Even email confirmations are sent only sometimes.

    Thanks!

    Gintas

    To unsubscribe from this group and stop receiving emails from it, send an
    email to scm-users+unsubscribe@cloudera.org.


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156

    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.
  • Gintautas Sulskus at Aug 30, 2013 at 6:22 pm
    Thanks a lot, Adam, your answers really helped. I was stuck at the moment
    with these problems.

    Regarding the 3rd point. Well, even with the fresh install I had problems
    with permissions. Even on a fresh Ubuntu 10.04 LTS install (CDH3).
    Are there any suggested pre-installation steps to ensure a smooth cluster
    installation?


    Regards,
    Gintas




    On Fri, Aug 30, 2013 at 3:05 PM, Adam Smieszny wrote:

    1.You should first look at the
    /var/log/cloudera-scm-agent/cloudera-scm-agent.log on the machine S. It
    should tell you what happened to the Agent process.

    2. If your entire cluster got wiped off the map, and you couldn't access
    any of the machines, then you would lose your data. But, the point of
    Hadoop is that it will replicate your data across multiple machines. If you
    only have 1 slave node, then it can't replicate it to multiple machines.
    But, if you had 3 slave machines, then you could actually lose 2 of the
    machines and still have all of your data available, because data is
    replicated 3 times by default.

    3. Yes, I am working with a number of my clients to upgrade from CDH3 to
    CDH4. You should be able to follow the steps outlined here:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_6.html

    Hope this helps
    Thanks,
    Adam

    On Fri, Aug 30, 2013 at 9:13 AM, Gintautas Sulskus wrote:

    Hello,
    I have a system of three instances: server X that contains Cloudera
    manager, server M - master node and S - slave node. A few days ago, for
    some reason, Cloduera Agent on the S server stopped responding. Manager
    complains, that there is no heartbeat from the S server. So my question is:
    1. What are the solutions to absence of heartbeat: any logs to check? on
    which server?

    Also, I was considering to deploy CDH cluster to production, but
    considering the presence of sensitive data, what are the options when the
    whole cluster crashes?
    2. How to recover crashed CDH cluster? E.g. nodes do not restore and
    replicate anymore. Is there any way to extract data out of such cluster?

    Currently I am using CDH3. Tried to upgrade it to CDH4 but it crashed in
    the middle of the progress.
    3. Has anyone successfully accomplished this at all?

    PS. Have you tried to register to Cloudera Forum? Seems that it doesn't
    work. Even email confirmations are sent only sometimes.

    Thanks!

    Gintas

    To unsubscribe from this group and stop receiving emails from it, send an
    email to scm-users+unsubscribe@cloudera.org.


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156


    --
    Best Regards,
    Gintautas Sulskus

    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.
  • Adam Smieszny at Aug 30, 2013 at 6:26 pm
    Which release of CDH3 were you on? CDH3u?

    Thanks,
    Adam
    On Fri, Aug 30, 2013 at 2:22 PM, Gintautas Sulskus wrote:

    Thanks a lot, Adam, your answers really helped. I was stuck at the moment
    with these problems.

    Regarding the 3rd point. Well, even with the fresh install I had problems
    with permissions. Even on a fresh Ubuntu 10.04 LTS install (CDH3).
    Are there any suggested pre-installation steps to ensure a smooth cluster
    installation?


    Regards,
    Gintas




    On Fri, Aug 30, 2013 at 3:05 PM, Adam Smieszny wrote:

    1.You should first look at the
    /var/log/cloudera-scm-agent/cloudera-scm-agent.log on the machine S. It
    should tell you what happened to the Agent process.

    2. If your entire cluster got wiped off the map, and you couldn't access
    any of the machines, then you would lose your data. But, the point of
    Hadoop is that it will replicate your data across multiple machines. If you
    only have 1 slave node, then it can't replicate it to multiple machines.
    But, if you had 3 slave machines, then you could actually lose 2 of the
    machines and still have all of your data available, because data is
    replicated 3 times by default.

    3. Yes, I am working with a number of my clients to upgrade from CDH3 to
    CDH4. You should be able to follow the steps outlined here:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_6.html

    Hope this helps
    Thanks,
    Adam

    On Fri, Aug 30, 2013 at 9:13 AM, Gintautas Sulskus wrote:

    Hello,
    I have a system of three instances: server X that contains Cloudera
    manager, server M - master node and S - slave node. A few days ago, for
    some reason, Cloduera Agent on the S server stopped responding. Manager
    complains, that there is no heartbeat from the S server. So my question is:
    1. What are the solutions to absence of heartbeat: any logs to check? on
    which server?

    Also, I was considering to deploy CDH cluster to production, but
    considering the presence of sensitive data, what are the options when the
    whole cluster crashes?
    2. How to recover crashed CDH cluster? E.g. nodes do not restore and
    replicate anymore. Is there any way to extract data out of such cluster?

    Currently I am using CDH3. Tried to upgrade it to CDH4 but it crashed in
    the middle of the progress.
    3. Has anyone successfully accomplished this at all?

    PS. Have you tried to register to Cloudera Forum? Seems that it doesn't
    work. Even email confirmations are sent only sometimes.

    Thanks!

    Gintas

    To unsubscribe from this group and stop receiving emails from it, send
    an email to scm-users+unsubscribe@cloudera.org.


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156


    --
    Best Regards,
    Gintautas Sulskus


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156

    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.
  • Gintautas Sulskus at Aug 30, 2013 at 6:40 pm
    The latest, CDH3u6 if I am not mistaken. By default, it fails to set up
    proper hbase permissions for different services (not Ubuntu problem).

    We had problems with Ubuntu with earlier CDH4 versions. Next week, if the
    decision will be made, I will attempt to install the latest CDH4. Will let
    you know.

    Thanks,
    Gintas

    On Fri, Aug 30, 2013 at 7:25 PM, Adam Smieszny wrote:

    Which release of CDH3 were you on? CDH3u?

    Thanks,
    Adam
    On Fri, Aug 30, 2013 at 2:22 PM, Gintautas Sulskus wrote:

    Thanks a lot, Adam, your answers really helped. I was stuck at the moment
    with these problems.

    Regarding the 3rd point. Well, even with the fresh install I had problems
    with permissions. Even on a fresh Ubuntu 10.04 LTS install (CDH3).
    Are there any suggested pre-installation steps to ensure a smooth cluster
    installation?


    Regards,
    Gintas




    On Fri, Aug 30, 2013 at 3:05 PM, Adam Smieszny wrote:

    1.You should first look at the
    /var/log/cloudera-scm-agent/cloudera-scm-agent.log on the machine S. It
    should tell you what happened to the Agent process.

    2. If your entire cluster got wiped off the map, and you couldn't access
    any of the machines, then you would lose your data. But, the point of
    Hadoop is that it will replicate your data across multiple machines. If you
    only have 1 slave node, then it can't replicate it to multiple machines.
    But, if you had 3 slave machines, then you could actually lose 2 of the
    machines and still have all of your data available, because data is
    replicated 3 times by default.

    3. Yes, I am working with a number of my clients to upgrade from CDH3 to
    CDH4. You should be able to follow the steps outlined here:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_6.html

    Hope this helps
    Thanks,
    Adam

    On Fri, Aug 30, 2013 at 9:13 AM, Gintautas Sulskus wrote:

    Hello,
    I have a system of three instances: server X that contains Cloudera
    manager, server M - master node and S - slave node. A few days ago, for
    some reason, Cloduera Agent on the S server stopped responding. Manager
    complains, that there is no heartbeat from the S server. So my question is:
    1. What are the solutions to absence of heartbeat: any logs to check?
    on which server?

    Also, I was considering to deploy CDH cluster to production, but
    considering the presence of sensitive data, what are the options when the
    whole cluster crashes?
    2. How to recover crashed CDH cluster? E.g. nodes do not restore and
    replicate anymore. Is there any way to extract data out of such cluster?

    Currently I am using CDH3. Tried to upgrade it to CDH4 but it crashed
    in the middle of the progress.
    3. Has anyone successfully accomplished this at all?

    PS. Have you tried to register to Cloudera Forum? Seems that it doesn't
    work. Even email confirmations are sent only sometimes.

    Thanks!

    Gintas

    To unsubscribe from this group and stop receiving emails from it, send
    an email to scm-users+unsubscribe@cloudera.org.


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156


    --
    Best Regards,
    Gintautas Sulskus


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156


    --
    Best Regards,
    Gintautas Sulskus

    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.
  • Gintautas Sulskus at Sep 6, 2013 at 10:55 am
    Hello again! :)

    I'm back with some news on CDH.

    Issue 1: What are the solutions to absence of heartbeat: any logs to check?
    on which server?

    Well, I've checked cloudera-scm-agent.log on the problematic node:

    Traceback (most recent call last):
       File "/usr/lib/cmf/agent/src/cmf/agent.py", line 661, in send_heartbeat
         self.master_port)
       File
    "/usr/lib/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py",
    line 471, in __init__
         self.conn.connect()
       File "/usr/lib/python2.6/httplib.py", line 716, in connect
         self.timeout)
       File "/usr/lib/python2.6/socket.py", line 514, in create_connection
         raise error, msg
    error: [Errno 111] Connection refused

    telnet connects to the port just fine.
    netstat -atlpvn shows 2 connections established (the right number of
    servers) and listening for more.
    cloudera manager (5.4.x) reports my node server in bad health and any
    action on it results in a connection timeout

    security grout rules on all servers are (no restrictions):
    all all 0.0.0.0/0
    -1 icmp 0.0.0.0/0

    Any clues what could be wrong?


    Issue 3: still stuck with CDH 3. Until I fix it can't move on to CDH4, but
    will definitely update you.

    Thanks,
    Gintas

    On Fri, Aug 30, 2013 at 7:40 PM, Gintautas Sulskus wrote:

    The latest, CDH3u6 if I am not mistaken. By default, it fails to set up
    proper hbase permissions for different services (not Ubuntu problem).

    We had problems with Ubuntu with earlier CDH4 versions. Next week, if the
    decision will be made, I will attempt to install the latest CDH4. Will let
    you know.

    Thanks,
    Gintas

    On Fri, Aug 30, 2013 at 7:25 PM, Adam Smieszny wrote:

    Which release of CDH3 were you on? CDH3u?

    Thanks,
    Adam
    On Fri, Aug 30, 2013 at 2:22 PM, Gintautas Sulskus wrote:

    Thanks a lot, Adam, your answers really helped. I was stuck at the
    moment with these problems.

    Regarding the 3rd point. Well, even with the fresh install I had
    problems with permissions. Even on a fresh Ubuntu 10.04 LTS install (CDH3).
    Are there any suggested pre-installation steps to ensure a smooth
    cluster installation?


    Regards,
    Gintas




    On Fri, Aug 30, 2013 at 3:05 PM, Adam Smieszny wrote:

    1.You should first look at the
    /var/log/cloudera-scm-agent/cloudera-scm-agent.log on the machine S. It
    should tell you what happened to the Agent process.

    2. If your entire cluster got wiped off the map, and you couldn't
    access any of the machines, then you would lose your data. But, the point
    of Hadoop is that it will replicate your data across multiple machines. If
    you only have 1 slave node, then it can't replicate it to multiple
    machines. But, if you had 3 slave machines, then you could actually lose 2
    of the machines and still have all of your data available, because data is
    replicated 3 times by default.

    3. Yes, I am working with a number of my clients to upgrade from CDH3
    to CDH4. You should be able to follow the steps outlined here:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_6.html

    Hope this helps
    Thanks,
    Adam

    On Fri, Aug 30, 2013 at 9:13 AM, Gintautas Sulskus wrote:

    Hello,
    I have a system of three instances: server X that contains Cloudera
    manager, server M - master node and S - slave node. A few days ago, for
    some reason, Cloduera Agent on the S server stopped responding. Manager
    complains, that there is no heartbeat from the S server. So my question is:
    1. What are the solutions to absence of heartbeat: any logs to check?
    on which server?

    Also, I was considering to deploy CDH cluster to production, but
    considering the presence of sensitive data, what are the options when the
    whole cluster crashes?
    2. How to recover crashed CDH cluster? E.g. nodes do not restore and
    replicate anymore. Is there any way to extract data out of such cluster?

    Currently I am using CDH3. Tried to upgrade it to CDH4 but it crashed
    in the middle of the progress.
    3. Has anyone successfully accomplished this at all?

    PS. Have you tried to register to Cloudera Forum? Seems that it
    doesn't work. Even email confirmations are sent only sometimes.

    Thanks!

    Gintas

    To unsubscribe from this group and stop receiving emails from it, send
    an email to scm-users+unsubscribe@cloudera.org.


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156


    --
    Best Regards,
    Gintautas Sulskus


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156


    --
    Best Regards,
    Gintautas Sulskus


    --
    Best Regards,
    Gintautas Sulskus

    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.
  • Gintautas Sulskus at Sep 19, 2013 at 2:25 pm
    Hello again,

    I have fixed the issue. Strange DNS issues and non trivial debug output.
    CHD3/4 works just fine now. Haven't tried cluster upgrade though. But will
    definitely do.

    But now I had another problem.
    I was trying to copy data with a "distcp" command from Cluster1 to Cluster2
    via VPN. No luck.
    I can "telnet local-cluster-1-IP 8020", but "telnet
    local-cluster-1-*VPN-IP*8020" doesn't work. That said - my two
    clusters can not communicate between
    each other.

    I assume this has something to do with hadoop/hbase configuration, because
    non hadoop services are reachable.
    Any ideas?

    Thanks!

    Gintas

    On Fri, Sep 6, 2013 at 11:55 AM, Gintautas Sulskus wrote:

    Hello again! :)

    I'm back with some news on CDH.

    Issue 1: What are the solutions to absence of heartbeat: any logs to
    check? on which server?

    Well, I've checked cloudera-scm-agent.log on the problematic node:

    Traceback (most recent call last):
    File "/usr/lib/cmf/agent/src/cmf/agent.py", line 661, in send_heartbeat
    self.master_port)
    File
    "/usr/lib/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py",
    line 471, in __init__
    self.conn.connect()
    File "/usr/lib/python2.6/httplib.py", line 716, in connect
    self.timeout)
    File "/usr/lib/python2.6/socket.py", line 514, in create_connection
    raise error, msg
    error: [Errno 111] Connection refused

    telnet connects to the port just fine.
    netstat -atlpvn shows 2 connections established (the right number of
    servers) and listening for more.
    cloudera manager (5.4.x) reports my node server in bad health and any
    action on it results in a connection timeout

    security grout rules on all servers are (no restrictions):
    all all 0.0.0.0/0
    -1 icmp 0.0.0.0/0

    Any clues what could be wrong?


    Issue 3: still stuck with CDH 3. Until I fix it can't move on to CDH4, but
    will definitely update you.

    Thanks,
    Gintas

    On Fri, Aug 30, 2013 at 7:40 PM, Gintautas Sulskus wrote:

    The latest, CDH3u6 if I am not mistaken. By default, it fails to set up
    proper hbase permissions for different services (not Ubuntu problem).

    We had problems with Ubuntu with earlier CDH4 versions. Next week, if the
    decision will be made, I will attempt to install the latest CDH4. Will let
    you know.

    Thanks,
    Gintas

    On Fri, Aug 30, 2013 at 7:25 PM, Adam Smieszny wrote:

    Which release of CDH3 were you on? CDH3u?

    Thanks,
    Adam
    On Fri, Aug 30, 2013 at 2:22 PM, Gintautas Sulskus wrote:

    Thanks a lot, Adam, your answers really helped. I was stuck at the
    moment with these problems.

    Regarding the 3rd point. Well, even with the fresh install I had
    problems with permissions. Even on a fresh Ubuntu 10.04 LTS install (CDH3).
    Are there any suggested pre-installation steps to ensure a smooth
    cluster installation?


    Regards,
    Gintas




    On Fri, Aug 30, 2013 at 3:05 PM, Adam Smieszny wrote:

    1.You should first look at the
    /var/log/cloudera-scm-agent/cloudera-scm-agent.log on the machine S. It
    should tell you what happened to the Agent process.

    2. If your entire cluster got wiped off the map, and you couldn't
    access any of the machines, then you would lose your data. But, the point
    of Hadoop is that it will replicate your data across multiple machines. If
    you only have 1 slave node, then it can't replicate it to multiple
    machines. But, if you had 3 slave machines, then you could actually lose 2
    of the machines and still have all of your data available, because data is
    replicated 3 times by default.

    3. Yes, I am working with a number of my clients to upgrade from CDH3
    to CDH4. You should be able to follow the steps outlined here:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_6.html

    Hope this helps
    Thanks,
    Adam

    On Fri, Aug 30, 2013 at 9:13 AM, Gintautas Sulskus wrote:

    Hello,
    I have a system of three instances: server X that contains Cloudera
    manager, server M - master node and S - slave node. A few days ago, for
    some reason, Cloduera Agent on the S server stopped responding. Manager
    complains, that there is no heartbeat from the S server. So my question is:
    1. What are the solutions to absence of heartbeat: any logs to check?
    on which server?

    Also, I was considering to deploy CDH cluster to production, but
    considering the presence of sensitive data, what are the options when the
    whole cluster crashes?
    2. How to recover crashed CDH cluster? E.g. nodes do not restore and
    replicate anymore. Is there any way to extract data out of such cluster?

    Currently I am using CDH3. Tried to upgrade it to CDH4 but it crashed
    in the middle of the progress.
    3. Has anyone successfully accomplished this at all?

    PS. Have you tried to register to Cloudera Forum? Seems that it
    doesn't work. Even email confirmations are sent only sometimes.

    Thanks!

    Gintas

    To unsubscribe from this group and stop receiving emails from it,
    send an email to scm-users+unsubscribe@cloudera.org.


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156


    --
    Best Regards,
    Gintautas Sulskus


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://www.linkedin.com/in/adamsmieszny
    917.830.4156


    --
    Best Regards,
    Gintautas Sulskus


    --
    Best Regards,
    Gintautas Sulskus


    --
    Best Regards,
    Gintautas Sulskus

    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedAug 30, '13 at 1:13p
activeSep 19, '13 at 2:25p
posts7
users2
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase