FAQ
Hi Shouvanik,

Is the agent on your various hosts running? ("service cloudera-scm-agent
status" or "ps aux | grep agent.py" might tell you). If it is, what does
its logs say, in /var/log/cloudera-scm-agent?

-- Philip
On Fri, Nov 30, 2012 at 11:00 PM, Shouvanik Haldar wrote:

Hi Philip,

Please help me...I am not able to see hosts even after installation. Can
you please give a check-list as to what all things I need to take care of
before doing this?

Regards,
Shouvanik

On Saturday, 5 May 2012 07:17:12 UTC+5:30, Philip Zeyliger wrote:

Hi Joost,

There are situations where the manager will talk to the agent over port
9000. In the Free Edition, it'll do that for viewing log files, so, yes,
you should open it up.

-- Philip
On Mon, Apr 30, 2012 at 12:39 AM, Joost den Boer wrote:

Philip, Harsh,

Thanks for your quick reply.
Yes, fixing the /etc/hosts helped.
Now it contains:
127.0.0.1 localhost
192.168.1.221 cdhnode1.diversit.local cdhnode1

And also changed localhost in /etc/sysconfig/network

I thought I read somewhere that the 127.0.0.1 had to be removed and
/etc/hosts should only have one line. That's why I had it the way it was.
In another post I read to do a test with 'host -v -t A `hostname`' and
that revolved to the correct FQDN so I thought that would not be the
problem.

Anyway, thanks again for your help.
ps. I see the agent opens a port 9000. Is the manager talking to the
agent over this port? So should this port be opened in iptables?

Regards,
Joost


On Sun, Apr 29, 2012 at 10:01 PM, Philip Zeyliger wrote:

On Sun, Apr 29, 2012 at 1:00 PM, Harsh J wrote:

Additionally, I'd like to note that in:

192.168.1.221 cdhnode1 cdhnode1.diversit.local localhost

1. The "localhost" seems out of place. Its to go to 127.0.0.1 alone.
Ah; I missed this in your original message. This explains what you're
seeing, and is wrong. localhost should always point to 127.0.0.1.

Search Discussions

  • Philip Zeyliger at Dec 2, 2012 at 8:17 am
    Hi Shouvanik,

    Please continue to cc: scm-users. You'll get more consistent help that
    way, and more people will benefit from the answers.

    That error message says someone is already on that port. Perhaps a job
    tracker is already running? Perhaps the CDH service scripts have started
    it? "lsof -n -P -i | grep -i 8021" will show you what pid that is.

    It seems like you had a similar problem with the supervisor. It's possible
    that something got wedged somewhere along the way and it might be worth
    killing all processes owned by the mapred and hdfs users.

    -- Philip
    On Sat, Dec 1, 2012 at 9:06 PM, Shouvanik Haldar wrote:

    Hi Philip,

    I am not able to start Job tracker.
    Getting this weird error. Can you please help?

    2012-12-02 00:04:28,032 FATAL org.apache.hadoop.mapred.JobTracker:
    java.net.BindException: Problem binding to
    ip-xx-xx-xxx-xx.ec2.internal/xx.xx.xxx.xx:8021 : Address already in use
    at org.apache.hadoop.ipc.Server.bind(Server.java:230)
    at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:319)
    at org.apache.hadoop.ipc.Server.<init>(Server.java:1529)
    at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:539)
    at org.apache.hadoop.ipc.RPC.getServer(RPC.java:500)
    at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2143)
    at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2050)
    at
    org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:296)
    at
    org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:288)
    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4792)
    Caused by: java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind(Native Method)
    at
    sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
    at org.apache.hadoop.ipc.Server.bind(Server.java:228)
    ... 9 more

    2012-12-02 00:04:28,040 INFO org.apache.hadoop.mapred.JobTracker:
    SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down JobTracker at
    ip-xx-xx-xxx-xx.ec2.internal/xx.xx.xxx.xx


    Regards,
    Shouvanik


    On Sun, Dec 2, 2012 at 9:28 AM, Philip Zeyliger wrote:

    Error: Another program is already listening on a port that one of our
    HTTP servers is configured to use. Shut this program down first before
    starting supervisord.

    You've got some other process listening on 9000 or 9001, preventing the
    agent to start. Use "lsof -P -i -n | grep 900[01]" to find that process.


    On Sat, Dec 1, 2012 at 7:52 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    I am getting the following errors inside "
    /var/log/cloudera-scm-agent/cloudera-scm-agent.log"

    URLError: <urlopen error [Errno 111] Connection refused>
    [30/Nov/2012 06:47:33 +0000] 24977 TaskTrackerAttemptMonitor
    tasktracker ERROR TaskTracker at http://127.0.0.1:4867 is not
    responding: [Errno socket error] [Errno 111] Connection refused.


    And when I open up "/var/log/cloudera-scm-agent/cloudera-scm-agent.out"

    I get the following errors

    [01/Dec/2012 22:46:41 +0000] 14523 MainThread agent INFO
    Logging to /var/log/cloudera-scm-agent/cloudera-scm-agent.log
    Error: Another program is already listening on a port that one of our
    HTTP servers is configured to use. Shut this program down first before
    starting supervisord.


    Please help Philip.


    On Sun, Dec 2, 2012 at 8:02 AM, Philip Zeyliger wrote:

    Hi Shouvanik,

    Is the agent on your various hosts running? ("service
    cloudera-scm-agent status" or "ps aux | grep agent.py" might tell you). If
    it is, what does its logs say, in /var/log/cloudera-scm-agent?

    -- Philip

    On Fri, Nov 30, 2012 at 11:00 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    Please help me...I am not able to see hosts even after installation.
    Can you please give a check-list as to what all things I need to take care
    of before doing this?

    Regards,
    Shouvanik

    On Saturday, 5 May 2012 07:17:12 UTC+5:30, Philip Zeyliger wrote:

    Hi Joost,

    There are situations where the manager will talk to the agent over
    port 9000. In the Free Edition, it'll do that for viewing log files, so,
    yes, you should open it up.

    -- Philip
    On Mon, Apr 30, 2012 at 12:39 AM, Joost den Boer wrote:

    Philip, Harsh,

    Thanks for your quick reply.
    Yes, fixing the /etc/hosts helped.
    Now it contains:
    127.0.0.1 localhost
    192.168.1.221 cdhnode1.diversit.local cdhnode1

    And also changed localhost in /etc/sysconfig/network

    I thought I read somewhere that the 127.0.0.1 had to be removed and
    /etc/hosts should only have one line. That's why I had it the way it was.
    In another post I read to do a test with 'host -v -t A `hostname`'
    and that revolved to the correct FQDN so I thought that would not be the
    problem.

    Anyway, thanks again for your help.
    ps. I see the agent opens a port 9000. Is the manager talking to the
    agent over this port? So should this port be opened in iptables?

    Regards,
    Joost


    On Sun, Apr 29, 2012 at 10:01 PM, Philip Zeyliger <
    phi...@cloudera.com> wrote:

    On Sun, Apr 29, 2012 at 1:00 PM, Harsh J wrote:

    Additionally, I'd like to note that in:

    192.168.1.221 cdhnode1 cdhnode1.diversit.local localhost

    1. The "localhost" seems out of place. Its to go to 127.0.0.1
    alone.
    Ah; I missed this in your original message. This explains what
    you're seeing, and is wrong. localhost should always point to 127.0.0.1.

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*
  • Shouvanik Haldar at Dec 2, 2012 at 8:30 am
    Ok Philip...I will do that..

    I used "lsof -n -P -i | grep -i 8021" but it returned nothing.

    Please advice what to do now?

    Regards,
    Shouvanik
    On Sun, Dec 2, 2012 at 1:47 PM, Philip Zeyliger wrote:

    Hi Shouvanik,

    Please continue to cc: scm-users. You'll get more consistent help that
    way, and more people will benefit from the answers.

    That error message says someone is already on that port. Perhaps a job
    tracker is already running? Perhaps the CDH service scripts have started
    it? "lsof -n -P -i | grep -i 8021" will show you what pid that is.

    It seems like you had a similar problem with the supervisor. It's
    possible that something got wedged somewhere along the way and it might be
    worth killing all processes owned by the mapred and hdfs users.

    -- Philip


    On Sat, Dec 1, 2012 at 9:06 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    I am not able to start Job tracker.
    Getting this weird error. Can you please help?

    2012-12-02 00:04:28,032 FATAL org.apache.hadoop.mapred.JobTracker:
    java.net.BindException: Problem binding to
    ip-xx-xx-xxx-xx.ec2.internal/xx.xx.xxx.xx:8021 : Address already in use
    at org.apache.hadoop.ipc.Server.bind(Server.java:230)
    at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:319)
    at org.apache.hadoop.ipc.Server.<init>(Server.java:1529)
    at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:539)
    at org.apache.hadoop.ipc.RPC.getServer(RPC.java:500)
    at
    org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2143)
    at
    org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2050)
    at
    org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:296)
    at
    org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:288)
    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4792)
    Caused by: java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind(Native Method)
    at
    sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
    at
    sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
    at org.apache.hadoop.ipc.Server.bind(Server.java:228)
    ... 9 more

    2012-12-02 00:04:28,040 INFO org.apache.hadoop.mapred.JobTracker:
    SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down JobTracker at
    ip-xx-xx-xxx-xx.ec2.internal/xx.xx.xxx.xx


    Regards,
    Shouvanik


    On Sun, Dec 2, 2012 at 9:28 AM, Philip Zeyliger wrote:

    Error: Another program is already listening on a port that one of our
    HTTP servers is configured to use. Shut this program down first before
    starting supervisord.

    You've got some other process listening on 9000 or 9001, preventing the
    agent to start. Use "lsof -P -i -n | grep 900[01]" to find that process.


    On Sat, Dec 1, 2012 at 7:52 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    I am getting the following errors inside "
    /var/log/cloudera-scm-agent/cloudera-scm-agent.log"

    URLError: <urlopen error [Errno 111] Connection refused>
    [30/Nov/2012 06:47:33 +0000] 24977 TaskTrackerAttemptMonitor
    tasktracker ERROR TaskTracker at http://127.0.0.1:4867 is not
    responding: [Errno socket error] [Errno 111] Connection refused.


    And when I open up "/var/log/cloudera-scm-agent/cloudera-scm-agent.out"

    I get the following errors

    [01/Dec/2012 22:46:41 +0000] 14523 MainThread agent INFO
    Logging to /var/log/cloudera-scm-agent/cloudera-scm-agent.log
    Error: Another program is already listening on a port that one of our
    HTTP servers is configured to use. Shut this program down first before
    starting supervisord.


    Please help Philip.


    On Sun, Dec 2, 2012 at 8:02 AM, Philip Zeyliger wrote:

    Hi Shouvanik,

    Is the agent on your various hosts running? ("service
    cloudera-scm-agent status" or "ps aux | grep agent.py" might tell you). If
    it is, what does its logs say, in /var/log/cloudera-scm-agent?

    -- Philip

    On Fri, Nov 30, 2012 at 11:00 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    Please help me...I am not able to see hosts even after installation.
    Can you please give a check-list as to what all things I need to take care
    of before doing this?

    Regards,
    Shouvanik

    On Saturday, 5 May 2012 07:17:12 UTC+5:30, Philip Zeyliger wrote:

    Hi Joost,

    There are situations where the manager will talk to the agent over
    port 9000. In the Free Edition, it'll do that for viewing log files, so,
    yes, you should open it up.

    -- Philip

    On Mon, Apr 30, 2012 at 12:39 AM, Joost den Boer <jdb...@diversit.eu
    wrote:
    Philip, Harsh,

    Thanks for your quick reply.
    Yes, fixing the /etc/hosts helped.
    Now it contains:
    127.0.0.1 localhost
    192.168.1.221 cdhnode1.diversit.local cdhnode1

    And also changed localhost in /etc/sysconfig/network

    I thought I read somewhere that the 127.0.0.1 had to be removed and
    /etc/hosts should only have one line. That's why I had it the way it was.
    In another post I read to do a test with 'host -v -t A `hostname`'
    and that revolved to the correct FQDN so I thought that would not be the
    problem.

    Anyway, thanks again for your help.
    ps. I see the agent opens a port 9000. Is the manager talking to
    the agent over this port? So should this port be opened in iptables?

    Regards,
    Joost


    On Sun, Apr 29, 2012 at 10:01 PM, Philip Zeyliger <
    phi...@cloudera.com> wrote:

    On Sun, Apr 29, 2012 at 1:00 PM, Harsh J wrote:

    Additionally, I'd like to note that in:

    192.168.1.221 cdhnode1 cdhnode1.diversit.local localhost

    1. The "localhost" seems out of place. Its to go to 127.0.0.1
    alone.
    Ah; I missed this in your original message. This explains what
    you're seeing, and is wrong. localhost should always point to 127.0.0.1.

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*
  • Shouvanik Haldar at Dec 2, 2012 at 8:46 am
    Hi Philip,
    Sorry, but what did u mean by "error is jogging my memory a bit".
    What is JT ?

    I am logged in as "root".
    By the way, I will post here some logs which might help you. I am facing
    problem while starting up mapreduce service via cloudera manager free
    edition 3.7.x.
    Rest of the services 1)hbase 2) hdfs and 3) zookeeper running fine...!

    Please wait for sometime.
    Thanks for your patience.

    Regards,
    Shouvanik
    On Sun, Dec 2, 2012 at 2:02 PM, Philip Zeyliger wrote:

    If you ran that as root, that error is jogging my memory a bit. What's in
    the log for about the page before that? I think in the JT's case, that's
    the last thing it logs, but the fatal error is somewhere above that.

    -- Philip

    On Sun, Dec 2, 2012 at 12:30 AM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Ok Philip...I will do that..

    I used "lsof -n -P -i | grep -i 8021" but it returned nothing.

    Please advice what to do now?

    Regards,
    Shouvanik

    On Sun, Dec 2, 2012 at 1:47 PM, Philip Zeyliger wrote:

    Hi Shouvanik,

    Please continue to cc: scm-users. You'll get more consistent help that
    way, and more people will benefit from the answers.

    That error message says someone is already on that port. Perhaps a job
    tracker is already running? Perhaps the CDH service scripts have started
    it? "lsof -n -P -i | grep -i 8021" will show you what pid that is.

    It seems like you had a similar problem with the supervisor. It's
    possible that something got wedged somewhere along the way and it might be
    worth killing all processes owned by the mapred and hdfs users.

    -- Philip


    On Sat, Dec 1, 2012 at 9:06 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    I am not able to start Job tracker.
    Getting this weird error. Can you please help?

    2012-12-02 00:04:28,032 FATAL org.apache.hadoop.mapred.JobTracker:
    java.net.BindException: Problem binding to
    ip-xx-xx-xxx-xx.ec2.internal/xx.xx.xxx.xx:8021 : Address already in use
    at org.apache.hadoop.ipc.Server.bind(Server.java:230)
    at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:319)
    at org.apache.hadoop.ipc.Server.<init>(Server.java:1529)
    at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:539)
    at org.apache.hadoop.ipc.RPC.getServer(RPC.java:500)
    at
    org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2143)
    at
    org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2050)
    at
    org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:296)
    at
    org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:288)
    at
    org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4792)
    Caused by: java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind(Native Method)
    at
    sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
    at
    sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
    at org.apache.hadoop.ipc.Server.bind(Server.java:228)
    ... 9 more

    2012-12-02 00:04:28,040 INFO org.apache.hadoop.mapred.JobTracker:
    SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down JobTracker at
    ip-xx-xx-xxx-xx.ec2.internal/xx.xx.xxx.xx


    Regards,
    Shouvanik


    On Sun, Dec 2, 2012 at 9:28 AM, Philip Zeyliger wrote:

    Error: Another program is already listening on a port that one of
    our HTTP servers is configured to use. Shut this program down first before
    starting supervisord.

    You've got some other process listening on 9000 or 9001, preventing
    the agent to start. Use "lsof -P -i -n | grep 900[01]" to find that
    process.


    On Sat, Dec 1, 2012 at 7:52 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    I am getting the following errors inside "
    /var/log/cloudera-scm-agent/cloudera-scm-agent.log"

    URLError: <urlopen error [Errno 111] Connection refused>
    [30/Nov/2012 06:47:33 +0000] 24977 TaskTrackerAttemptMonitor
    tasktracker ERROR TaskTracker at http://127.0.0.1:4867 is not
    responding: [Errno socket error] [Errno 111] Connection refused.


    And when I open up
    "/var/log/cloudera-scm-agent/cloudera-scm-agent.out"

    I get the following errors

    [01/Dec/2012 22:46:41 +0000] 14523 MainThread agent INFO
    Logging to /var/log/cloudera-scm-agent/cloudera-scm-agent.log
    Error: Another program is already listening on a port that one of our
    HTTP servers is configured to use. Shut this program down first before
    starting supervisord.


    Please help Philip.


    On Sun, Dec 2, 2012 at 8:02 AM, Philip Zeyliger wrote:

    Hi Shouvanik,

    Is the agent on your various hosts running? ("service
    cloudera-scm-agent status" or "ps aux | grep agent.py" might tell you). If
    it is, what does its logs say, in /var/log/cloudera-scm-agent?

    -- Philip

    On Fri, Nov 30, 2012 at 11:00 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    Please help me...I am not able to see hosts even after
    installation. Can you please give a check-list as to what all things I need
    to take care of before doing this?

    Regards,
    Shouvanik

    On Saturday, 5 May 2012 07:17:12 UTC+5:30, Philip Zeyliger wrote:

    Hi Joost,

    There are situations where the manager will talk to the agent over
    port 9000. In the Free Edition, it'll do that for viewing log files, so,
    yes, you should open it up.

    -- Philip

    On Mon, Apr 30, 2012 at 12:39 AM, Joost den Boer <
    jdb...@diversit.eu> wrote:
    Philip, Harsh,

    Thanks for your quick reply.
    Yes, fixing the /etc/hosts helped.
    Now it contains:
    127.0.0.1 localhost
    192.168.1.221 cdhnode1.diversit.local cdhnode1

    And also changed localhost in /etc/sysconfig/network

    I thought I read somewhere that the 127.0.0.1 had to be removed
    and /etc/hosts should only have one line. That's why I had it the way it
    was.
    In another post I read to do a test with 'host -v -t A
    `hostname`' and that revolved to the correct FQDN so I thought that would
    not be the problem.

    Anyway, thanks again for your help.
    ps. I see the agent opens a port 9000. Is the manager talking to
    the agent over this port? So should this port be opened in iptables?

    Regards,
    Joost


    On Sun, Apr 29, 2012 at 10:01 PM, Philip Zeyliger <
    phi...@cloudera.com> wrote:

    On Sun, Apr 29, 2012 at 1:00 PM, Harsh J wrote:

    Additionally, I'd like to note that in:

    192.168.1.221 cdhnode1 cdhnode1.diversit.local localhost

    1. The "localhost" seems out of place. Its to go to 127.0.0.1
    alone.
    Ah; I missed this in your original message. This explains what
    you're seeing, and is wrong. localhost should always point to 127.0.0.1.

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*
  • Shouvanik Haldar at Dec 2, 2012 at 8:56 am
    Hi,

    I am pasting the logs from "
    /var/log/cloudera-scm-server/cloudera-scm-server.log"

    2012-12-02 03:47:02,994 INFO
    [CommandPusher:service.AbstractBringUpBringDownCommands@505] BringUp
    command (128) has finished unsuccessfully on service DbService{id=10,
    name=mapreduce1, serviceType=MAPREDUCE, optimisticLockVersion=5} role id:28
    name: mapreduce1-JOBTRACKER-2 hostId:ip-10-40-222-77.ec2.internal
    roleType:JOBTRACKER configuredStatus:STOPPED configGeneration:3 status:NA
    service:mapreduce1.

    2012-12-02 03:47:02,994 INFO
    [CommandPusher:service.AbstractBringUpBringDownCommands@505] BringUp
    command (129) has finished unsuccessfully on service DbService{id=10,
    name=mapreduce1, serviceType=MAPREDUCE, optimisticLockVersion=5} role id:30
    name: mapreduce1-TASKTRACKER-1 hostId:ip-10-83-35-173.ec2.internal
    roleType:TASKTRACKER configuredStatus:STOPPED configGeneration:8 status:NA
    service:mapreduce1.


    I am also pasting error for Job Tracker..


    On Sun, Dec 2, 2012 at 2:16 PM, Shouvanik Haldar wrote:

    Hi Philip,
    Sorry, but what did u mean by "error is jogging my memory a bit".
    What is JT ?

    I am logged in as "root".
    By the way, I will post here some logs which might help you. I am facing
    problem while starting up mapreduce service via cloudera manager free
    edition 3.7.x.
    Rest of the services 1)hbase 2) hdfs and 3) zookeeper running fine...!

    Please wait for sometime.
    Thanks for your patience.

    Regards,
    Shouvanik

    On Sun, Dec 2, 2012 at 2:02 PM, Philip Zeyliger wrote:

    If you ran that as root, that error is jogging my memory a bit. What's
    in the log for about the page before that? I think in the JT's case,
    that's the last thing it logs, but the fatal error is somewhere above that.

    -- Philip

    On Sun, Dec 2, 2012 at 12:30 AM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Ok Philip...I will do that..

    I used "lsof -n -P -i | grep -i 8021" but it returned nothing.

    Please advice what to do now?

    Regards,
    Shouvanik

    On Sun, Dec 2, 2012 at 1:47 PM, Philip Zeyliger wrote:

    Hi Shouvanik,

    Please continue to cc: scm-users. You'll get more consistent help that
    way, and more people will benefit from the answers.

    That error message says someone is already on that port. Perhaps a job
    tracker is already running? Perhaps the CDH service scripts have started
    it? "lsof -n -P -i | grep -i 8021" will show you what pid that is.

    It seems like you had a similar problem with the supervisor. It's
    possible that something got wedged somewhere along the way and it might be
    worth killing all processes owned by the mapred and hdfs users.

    -- Philip


    On Sat, Dec 1, 2012 at 9:06 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    I am not able to start Job tracker.
    Getting this weird error. Can you please help?

    2012-12-02 00:04:28,032 FATAL org.apache.hadoop.mapred.JobTracker:
    java.net.BindException: Problem binding to
    ip-xx-xx-xxx-xx.ec2.internal/xx.xx.xxx.xx:8021 : Address already in use
    at org.apache.hadoop.ipc.Server.bind(Server.java:230)
    at
    org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:319)
    at org.apache.hadoop.ipc.Server.<init>(Server.java:1529)
    at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:539)
    at org.apache.hadoop.ipc.RPC.getServer(RPC.java:500)
    at
    org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2143)
    at
    org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2050)
    at
    org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:296)
    at
    org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:288)
    at
    org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4792)
    Caused by: java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind(Native Method)
    at
    sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
    at
    sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
    at org.apache.hadoop.ipc.Server.bind(Server.java:228)
    ... 9 more

    2012-12-02 00:04:28,040 INFO org.apache.hadoop.mapred.JobTracker:
    SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down JobTracker at
    ip-xx-xx-xxx-xx.ec2.internal/xx.xx.xxx.xx


    Regards,
    Shouvanik


    On Sun, Dec 2, 2012 at 9:28 AM, Philip Zeyliger wrote:

    Error: Another program is already listening on a port that one of
    our HTTP servers is configured to use. Shut this program down first before
    starting supervisord.

    You've got some other process listening on 9000 or 9001, preventing
    the agent to start. Use "lsof -P -i -n | grep 900[01]" to find that
    process.


    On Sat, Dec 1, 2012 at 7:52 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    I am getting the following errors inside "
    /var/log/cloudera-scm-agent/cloudera-scm-agent.log"

    URLError: <urlopen error [Errno 111] Connection refused>
    [30/Nov/2012 06:47:33 +0000] 24977 TaskTrackerAttemptMonitor
    tasktracker ERROR TaskTracker at http://127.0.0.1:4867 is not
    responding: [Errno socket error] [Errno 111] Connection refused.


    And when I open up
    "/var/log/cloudera-scm-agent/cloudera-scm-agent.out"

    I get the following errors

    [01/Dec/2012 22:46:41 +0000] 14523 MainThread agent INFO
    Logging to /var/log/cloudera-scm-agent/cloudera-scm-agent.log
    Error: Another program is already listening on a port that one of
    our HTTP servers is configured to use. Shut this program down first before
    starting supervisord.


    Please help Philip.



    On Sun, Dec 2, 2012 at 8:02 AM, Philip Zeyliger <philip@cloudera.com
    wrote:
    Hi Shouvanik,

    Is the agent on your various hosts running? ("service
    cloudera-scm-agent status" or "ps aux | grep agent.py" might tell you). If
    it is, what does its logs say, in /var/log/cloudera-scm-agent?

    -- Philip

    On Fri, Nov 30, 2012 at 11:00 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    Please help me...I am not able to see hosts even after
    installation. Can you please give a check-list as to what all things I need
    to take care of before doing this?

    Regards,
    Shouvanik

    On Saturday, 5 May 2012 07:17:12 UTC+5:30, Philip Zeyliger wrote:

    Hi Joost,

    There are situations where the manager will talk to the agent
    over port 9000. In the Free Edition, it'll do that for viewing log files,
    so, yes, you should open it up.

    -- Philip

    On Mon, Apr 30, 2012 at 12:39 AM, Joost den Boer <
    jdb...@diversit.eu> wrote:
    Philip, Harsh,

    Thanks for your quick reply.
    Yes, fixing the /etc/hosts helped.
    Now it contains:
    127.0.0.1 localhost
    192.168.1.221 cdhnode1.diversit.local cdhnode1

    And also changed localhost in /etc/sysconfig/network

    I thought I read somewhere that the 127.0.0.1 had to be removed
    and /etc/hosts should only have one line. That's why I had it the way it
    was.
    In another post I read to do a test with 'host -v -t A
    `hostname`' and that revolved to the correct FQDN so I thought that would
    not be the problem.

    Anyway, thanks again for your help.
    ps. I see the agent opens a port 9000. Is the manager talking to
    the agent over this port? So should this port be opened in iptables?

    Regards,
    Joost


    On Sun, Apr 29, 2012 at 10:01 PM, Philip Zeyliger <
    phi...@cloudera.com> wrote:

    On Sun, Apr 29, 2012 at 1:00 PM, Harsh J wrote:

    Additionally, I'd like to note that in:

    192.168.1.221 cdhnode1 cdhnode1.diversit.local localhost

    1. The "localhost" seems out of place. Its to go to 127.0.0.1
    alone.
    Ah; I missed this in your original message. This explains what
    you're seeing, and is wrong. localhost should always point to 127.0.0.1.

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*
  • Philip Zeyliger at Dec 2, 2012 at 4:56 pm
    On your machine (the one with the jobtracker (that's what JT is)) is a log
    file called /var/log/hadoop/*jobtracker*. (I'm working from memory, but
    it's something like that.) Could you send the complete log file there?

    Is there a reason you're still on 3.7.x? 4.1.x is out. It supports both
    CDH3 and CDH4, so if that's the concern, it's not strictly necessary.


    On Sun, Dec 2, 2012 at 12:56 AM, Shouvanik Haldar wrote:

    Hi,

    I am pasting the logs from "
    /var/log/cloudera-scm-server/cloudera-scm-server.log"

    2012-12-02 03:47:02,994 INFO
    [CommandPusher:service.AbstractBringUpBringDownCommands@505] BringUp
    command (128) has finished unsuccessfully on service DbService{id=10,
    name=mapreduce1, serviceType=MAPREDUCE, optimisticLockVersion=5} role id:28
    name: mapreduce1-JOBTRACKER-2 hostId:ip-10-40-222-77.ec2.internal
    roleType:JOBTRACKER configuredStatus:STOPPED configGeneration:3 status:NA
    service:mapreduce1.

    2012-12-02 03:47:02,994 INFO
    [CommandPusher:service.AbstractBringUpBringDownCommands@505] BringUp
    command (129) has finished unsuccessfully on service DbService{id=10,
    name=mapreduce1, serviceType=MAPREDUCE, optimisticLockVersion=5} role id:30
    name: mapreduce1-TASKTRACKER-1 hostId:ip-10-83-35-173.ec2.internal
    roleType:TASKTRACKER configuredStatus:STOPPED configGeneration:8 status:NA
    service:mapreduce1.


    I am also pasting error for Job Tracker..




    On Sun, Dec 2, 2012 at 2:16 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,
    Sorry, but what did u mean by "error is jogging my memory a bit".
    What is JT ?

    I am logged in as "root".
    By the way, I will post here some logs which might help you. I am facing
    problem while starting up mapreduce service via cloudera manager free
    edition 3.7.x.
    Rest of the services 1)hbase 2) hdfs and 3) zookeeper running fine...!

    Please wait for sometime.
    Thanks for your patience.

    Regards,
    Shouvanik

    On Sun, Dec 2, 2012 at 2:02 PM, Philip Zeyliger wrote:

    If you ran that as root, that error is jogging my memory a bit. What's
    in the log for about the page before that? I think in the JT's case,
    that's the last thing it logs, but the fatal error is somewhere above that.

    -- Philip

    On Sun, Dec 2, 2012 at 12:30 AM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Ok Philip...I will do that..

    I used "lsof -n -P -i | grep -i 8021" but it returned nothing.

    Please advice what to do now?

    Regards,
    Shouvanik

    On Sun, Dec 2, 2012 at 1:47 PM, Philip Zeyliger wrote:

    Hi Shouvanik,

    Please continue to cc: scm-users. You'll get more consistent help
    that way, and more people will benefit from the answers.

    That error message says someone is already on that port. Perhaps a
    job tracker is already running? Perhaps the CDH service scripts have
    started it? "lsof -n -P -i | grep -i 8021" will show you what pid that is.

    It seems like you had a similar problem with the supervisor. It's
    possible that something got wedged somewhere along the way and it might be
    worth killing all processes owned by the mapred and hdfs users.

    -- Philip


    On Sat, Dec 1, 2012 at 9:06 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    I am not able to start Job tracker.
    Getting this weird error. Can you please help?

    2012-12-02 00:04:28,032 FATAL org.apache.hadoop.mapred.JobTracker:
    java.net.BindException: Problem binding to
    ip-xx-xx-xxx-xx.ec2.internal/xx.xx.xxx.xx:8021 : Address already in use
    at org.apache.hadoop.ipc.Server.bind(Server.java:230)
    at
    org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:319)
    at org.apache.hadoop.ipc.Server.<init>(Server.java:1529)
    at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:539)
    at org.apache.hadoop.ipc.RPC.getServer(RPC.java:500)
    at
    org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2143)
    at
    org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2050)
    at
    org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:296)
    at
    org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:288)
    at
    org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4792)
    Caused by: java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind(Native Method)
    at
    sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
    at
    sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
    at org.apache.hadoop.ipc.Server.bind(Server.java:228)
    ... 9 more

    2012-12-02 00:04:28,040 INFO org.apache.hadoop.mapred.JobTracker:
    SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down JobTracker at
    ip-xx-xx-xxx-xx.ec2.internal/xx.xx.xxx.xx


    Regards,
    Shouvanik


    On Sun, Dec 2, 2012 at 9:28 AM, Philip Zeyliger wrote:

    Error: Another program is already listening on a port that one of
    our HTTP servers is configured to use. Shut this program down first before
    starting supervisord.

    You've got some other process listening on 9000 or 9001, preventing
    the agent to start. Use "lsof -P -i -n | grep 900[01]" to find that
    process.


    On Sat, Dec 1, 2012 at 7:52 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    I am getting the following errors inside "
    /var/log/cloudera-scm-agent/cloudera-scm-agent.log"

    URLError: <urlopen error [Errno 111] Connection refused>
    [30/Nov/2012 06:47:33 +0000] 24977 TaskTrackerAttemptMonitor
    tasktracker ERROR TaskTracker at http://127.0.0.1:4867 is not
    responding: [Errno socket error] [Errno 111] Connection refused.


    And when I open up
    "/var/log/cloudera-scm-agent/cloudera-scm-agent.out"

    I get the following errors

    [01/Dec/2012 22:46:41 +0000] 14523 MainThread agent INFO
    Logging to /var/log/cloudera-scm-agent/cloudera-scm-agent.log
    Error: Another program is already listening on a port that one of
    our HTTP servers is configured to use. Shut this program down first before
    starting supervisord.


    Please help Philip.



    On Sun, Dec 2, 2012 at 8:02 AM, Philip Zeyliger <
    philip@cloudera.com> wrote:
    Hi Shouvanik,

    Is the agent on your various hosts running? ("service
    cloudera-scm-agent status" or "ps aux | grep agent.py" might tell you). If
    it is, what does its logs say, in /var/log/cloudera-scm-agent?

    -- Philip

    On Fri, Nov 30, 2012 at 11:00 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    Please help me...I am not able to see hosts even after
    installation. Can you please give a check-list as to what all things I need
    to take care of before doing this?

    Regards,
    Shouvanik

    On Saturday, 5 May 2012 07:17:12 UTC+5:30, Philip Zeyliger wrote:

    Hi Joost,

    There are situations where the manager will talk to the agent
    over port 9000. In the Free Edition, it'll do that for viewing log files,
    so, yes, you should open it up.

    -- Philip

    On Mon, Apr 30, 2012 at 12:39 AM, Joost den Boer <
    jdb...@diversit.eu> wrote:
    Philip, Harsh,

    Thanks for your quick reply.
    Yes, fixing the /etc/hosts helped.
    Now it contains:
    127.0.0.1 localhost
    192.168.1.221 cdhnode1.diversit.local cdhnode1

    And also changed localhost in /etc/sysconfig/network

    I thought I read somewhere that the 127.0.0.1 had to be removed
    and /etc/hosts should only have one line. That's why I had it the way it
    was.
    In another post I read to do a test with 'host -v -t A
    `hostname`' and that revolved to the correct FQDN so I thought that would
    not be the problem.

    Anyway, thanks again for your help.
    ps. I see the agent opens a port 9000. Is the manager talking
    to the agent over this port? So should this port be opened in iptables?

    Regards,
    Joost


    On Sun, Apr 29, 2012 at 10:01 PM, Philip Zeyliger <
    phi...@cloudera.com> wrote:

    On Sun, Apr 29, 2012 at 1:00 PM, Harsh J wrote:

    Additionally, I'd like to note that in:

    192.168.1.221 cdhnode1 cdhnode1.diversit.local localhost

    1. The "localhost" seems out of place. Its to go to 127.0.0.1
    alone.
    Ah; I missed this in your original message. This explains
    what you're seeing, and is wrong. localhost should always point to
    127.0.0.1.

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*
  • Shouvanik Haldar at Dec 4, 2012 at 3:48 pm
    Hi,

    JT now starts because I had created a directory with 777 permission at
    /var/log/hadoop/history directory.
    But next problem I am facing is,

    org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
    /tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes,
    instead of 1

    However, all the services start!
    Please help

    Regards,
    Shouvanik
    On Sun, Dec 2, 2012 at 10:20 PM, Philip Zeyliger wrote:

    On your machine (the one with the jobtracker (that's what JT is)) is a log
    file called /var/log/hadoop/*jobtracker*. (I'm working from memory, but
    it's something like that.) Could you send the complete log file there?

    Is there a reason you're still on 3.7.x? 4.1.x is out. It supports both
    CDH3 and CDH4, so if that's the concern, it's not strictly necessary.



    On Sun, Dec 2, 2012 at 12:56 AM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi,

    I am pasting the logs from "
    /var/log/cloudera-scm-server/cloudera-scm-server.log"

    2012-12-02 03:47:02,994 INFO
    [CommandPusher:service.AbstractBringUpBringDownCommands@505] BringUp
    command (128) has finished unsuccessfully on service DbService{id=10,
    name=mapreduce1, serviceType=MAPREDUCE, optimisticLockVersion=5} role id:28
    name: mapreduce1-JOBTRACKER-2 hostId:ip-10-40-222-77.ec2.internal
    roleType:JOBTRACKER configuredStatus:STOPPED configGeneration:3 status:NA
    service:mapreduce1.

    2012-12-02 03:47:02,994 INFO
    [CommandPusher:service.AbstractBringUpBringDownCommands@505] BringUp
    command (129) has finished unsuccessfully on service DbService{id=10,
    name=mapreduce1, serviceType=MAPREDUCE, optimisticLockVersion=5} role id:30
    name: mapreduce1-TASKTRACKER-1 hostId:ip-10-83-35-173.ec2.internal
    roleType:TASKTRACKER configuredStatus:STOPPED configGeneration:8 status:NA
    service:mapreduce1.


    I am also pasting error for Job Tracker..




    On Sun, Dec 2, 2012 at 2:16 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,
    Sorry, but what did u mean by "error is jogging my memory a bit".
    What is JT ?

    I am logged in as "root".
    By the way, I will post here some logs which might help you. I am facing
    problem while starting up mapreduce service via cloudera manager free
    edition 3.7.x.
    Rest of the services 1)hbase 2) hdfs and 3) zookeeper running fine...!

    Please wait for sometime.
    Thanks for your patience.

    Regards,
    Shouvanik

    On Sun, Dec 2, 2012 at 2:02 PM, Philip Zeyliger wrote:

    If you ran that as root, that error is jogging my memory a bit. What's
    in the log for about the page before that? I think in the JT's case,
    that's the last thing it logs, but the fatal error is somewhere above that.

    -- Philip

    On Sun, Dec 2, 2012 at 12:30 AM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Ok Philip...I will do that..

    I used "lsof -n -P -i | grep -i 8021" but it returned nothing.

    Please advice what to do now?

    Regards,
    Shouvanik

    On Sun, Dec 2, 2012 at 1:47 PM, Philip Zeyliger wrote:

    Hi Shouvanik,

    Please continue to cc: scm-users. You'll get more consistent help
    that way, and more people will benefit from the answers.

    That error message says someone is already on that port. Perhaps a
    job tracker is already running? Perhaps the CDH service scripts have
    started it? "lsof -n -P -i | grep -i 8021" will show you what pid that is.

    It seems like you had a similar problem with the supervisor. It's
    possible that something got wedged somewhere along the way and it might be
    worth killing all processes owned by the mapred and hdfs users.

    -- Philip


    On Sat, Dec 1, 2012 at 9:06 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    I am not able to start Job tracker.
    Getting this weird error. Can you please help?

    2012-12-02 00:04:28,032 FATAL org.apache.hadoop.mapred.JobTracker:
    java.net.BindException: Problem binding to
    ip-xx-xx-xxx-xx.ec2.internal/xx.xx.xxx.xx:8021 : Address already in use
    at org.apache.hadoop.ipc.Server.bind(Server.java:230)
    at
    org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:319)
    at org.apache.hadoop.ipc.Server.<init>(Server.java:1529)
    at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:539)
    at org.apache.hadoop.ipc.RPC.getServer(RPC.java:500)
    at
    org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2143)
    at
    org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2050)
    at
    org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:296)
    at
    org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:288)
    at
    org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4792)
    Caused by: java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind(Native Method)
    at
    sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
    at
    sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
    at org.apache.hadoop.ipc.Server.bind(Server.java:228)
    ... 9 more

    2012-12-02 00:04:28,040 INFO org.apache.hadoop.mapred.JobTracker:
    SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down JobTracker at
    ip-xx-xx-xxx-xx.ec2.internal/xx.xx.xxx.xx


    Regards,
    Shouvanik



    On Sun, Dec 2, 2012 at 9:28 AM, Philip Zeyliger <philip@cloudera.com
    wrote:
    Error: Another program is already listening on a port that one of
    our HTTP servers is configured to use. Shut this program down first before
    starting supervisord.

    You've got some other process listening on 9000 or 9001, preventing
    the agent to start. Use "lsof -P -i -n | grep 900[01]" to find that
    process.


    On Sat, Dec 1, 2012 at 7:52 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    I am getting the following errors inside "
    /var/log/cloudera-scm-agent/cloudera-scm-agent.log"

    URLError: <urlopen error [Errno 111] Connection refused>
    [30/Nov/2012 06:47:33 +0000] 24977 TaskTrackerAttemptMonitor
    tasktracker ERROR TaskTracker at http://127.0.0.1:4867 is not
    responding: [Errno socket error] [Errno 111] Connection refused.


    And when I open up
    "/var/log/cloudera-scm-agent/cloudera-scm-agent.out"

    I get the following errors

    [01/Dec/2012 22:46:41 +0000] 14523 MainThread agent
    INFO Logging to /var/log/cloudera-scm-agent/cloudera-scm-agent.log
    Error: Another program is already listening on a port that one of
    our HTTP servers is configured to use. Shut this program down first before
    starting supervisord.


    Please help Philip.



    On Sun, Dec 2, 2012 at 8:02 AM, Philip Zeyliger <
    philip@cloudera.com> wrote:
    Hi Shouvanik,

    Is the agent on your various hosts running? ("service
    cloudera-scm-agent status" or "ps aux | grep agent.py" might tell you). If
    it is, what does its logs say, in /var/log/cloudera-scm-agent?

    -- Philip

    On Fri, Nov 30, 2012 at 11:00 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    Please help me...I am not able to see hosts even after
    installation. Can you please give a check-list as to what all things I need
    to take care of before doing this?

    Regards,
    Shouvanik

    On Saturday, 5 May 2012 07:17:12 UTC+5:30, Philip Zeyliger wrote:

    Hi Joost,

    There are situations where the manager will talk to the agent
    over port 9000. In the Free Edition, it'll do that for viewing log files,
    so, yes, you should open it up.

    -- Philip

    On Mon, Apr 30, 2012 at 12:39 AM, Joost den Boer <
    jdb...@diversit.eu> wrote:
    Philip, Harsh,

    Thanks for your quick reply.
    Yes, fixing the /etc/hosts helped.
    Now it contains:
    127.0.0.1 localhost
    192.168.1.221 cdhnode1.diversit.local cdhnode1

    And also changed localhost in /etc/sysconfig/network

    I thought I read somewhere that the 127.0.0.1 had to be
    removed and /etc/hosts should only have one line. That's why I had it the
    way it was.
    In another post I read to do a test with 'host -v -t A
    `hostname`' and that revolved to the correct FQDN so I thought that would
    not be the problem.

    Anyway, thanks again for your help.
    ps. I see the agent opens a port 9000. Is the manager talking
    to the agent over this port? So should this port be opened in iptables?

    Regards,
    Joost


    On Sun, Apr 29, 2012 at 10:01 PM, Philip Zeyliger <
    phi...@cloudera.com> wrote:

    On Sun, Apr 29, 2012 at 1:00 PM, Harsh J wrote:

    Additionally, I'd like to note that in:

    192.168.1.221 cdhnode1 cdhnode1.diversit.local localhost

    1. The "localhost" seems out of place. Its to go to
    127.0.0.1 alone.
    Ah; I missed this in your original message. This explains
    what you're seeing, and is wrong. localhost should always point to
    127.0.0.1.

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*
  • Philip Zeyliger at Dec 2, 2012 at 10:01 pm
    As you no doubt figured out, the real error was the following:

    2012-12-02 03:51:13,813 ERROR org.apache.hadoop.security.UserGroupInformation:
    PriviledgedActionException as:mapred (auth:SIMPLE) cause:ENOENT: No such
    file or directory
    2012-12-02 03:51:13,813 WARN org.apache.hadoop.mapred.JobTracker: Error
    starting tracker: ENOENT: No such file or directory
    at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)

    You've clearly resolved it by changing permissions.

    On Sun, Dec 2, 2012 at 11:11 AM, Shouvanik Haldar wrote:

    Hi,

    JT now starts because I had created a directory with 777 permission at
    /var/log/hadoop/history directory.
    But next problem I am facing is,

    org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
    /tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes,
    instead of 1
    This typically means that HDFS doesn't have free space. It typically
    doesn't have free space because the "reserved" disk space per node is too
    high, so there's nothing left over. Visit the namenode status page (port
    50070 by default) and see what it says in terms of free space. Then, if
    that's it, find the "datanode reserved disk space" option and tone it down
    to something smaller than the default 10GB. We've done smarter things here
    by default in CM 4.1 to default it lower in environments such as yours.

    -- Philip

    However, all the services start!
    Please help

    Regards,
    Shouvanik

    On Sun, Dec 2, 2012 at 10:20 PM, Philip Zeyliger wrote:

    On your machine (the one with the jobtracker (that's what JT is)) is a
    log file called /var/log/hadoop/*jobtracker*. (I'm working from memory,
    but it's something like that.) Could you send the complete log file there?

    Is there a reason you're still on 3.7.x? 4.1.x is out. It supports both
    CDH3 and CDH4, so if that's the concern, it's not strictly necessary.



    On Sun, Dec 2, 2012 at 12:56 AM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi,

    I am pasting the logs from "
    /var/log/cloudera-scm-server/cloudera-scm-server.log"

    2012-12-02 03:47:02,994 INFO
    [CommandPusher:service.AbstractBringUpBringDownCommands@505] BringUp
    command (128) has finished unsuccessfully on service DbService{id=10,
    name=mapreduce1, serviceType=MAPREDUCE, optimisticLockVersion=5} role id:28
    name: mapreduce1-JOBTRACKER-2 hostId:ip-10-40-222-77.ec2.internal
    roleType:JOBTRACKER configuredStatus:STOPPED configGeneration:3 status:NA
    service:mapreduce1.

    2012-12-02 03:47:02,994 INFO
    [CommandPusher:service.AbstractBringUpBringDownCommands@505] BringUp
    command (129) has finished unsuccessfully on service DbService{id=10,
    name=mapreduce1, serviceType=MAPREDUCE, optimisticLockVersion=5} role id:30
    name: mapreduce1-TASKTRACKER-1 hostId:ip-10-83-35-173.ec2.internal
    roleType:TASKTRACKER configuredStatus:STOPPED configGeneration:8 status:NA
    service:mapreduce1.


    I am also pasting error for Job Tracker..




    On Sun, Dec 2, 2012 at 2:16 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,
    Sorry, but what did u mean by "error is jogging my memory a bit".
    What is JT ?

    I am logged in as "root".
    By the way, I will post here some logs which might help you. I am
    facing problem while starting up mapreduce service via cloudera manager
    free edition 3.7.x.
    Rest of the services 1)hbase 2) hdfs and 3) zookeeper running fine...!

    Please wait for sometime.
    Thanks for your patience.

    Regards,
    Shouvanik

    On Sun, Dec 2, 2012 at 2:02 PM, Philip Zeyliger wrote:

    If you ran that as root, that error is jogging my memory a bit.
    What's in the log for about the page before that? I think in the JT's
    case, that's the last thing it logs, but the fatal error is somewhere above
    that.

    -- Philip

    On Sun, Dec 2, 2012 at 12:30 AM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Ok Philip...I will do that..

    I used "lsof -n -P -i | grep -i 8021" but it returned nothing.

    Please advice what to do now?

    Regards,
    Shouvanik

    On Sun, Dec 2, 2012 at 1:47 PM, Philip Zeyliger wrote:

    Hi Shouvanik,

    Please continue to cc: scm-users. You'll get more consistent help
    that way, and more people will benefit from the answers.

    That error message says someone is already on that port. Perhaps a
    job tracker is already running? Perhaps the CDH service scripts have
    started it? "lsof -n -P -i | grep -i 8021" will show you what pid that is.

    It seems like you had a similar problem with the supervisor. It's
    possible that something got wedged somewhere along the way and it might be
    worth killing all processes owned by the mapred and hdfs users.

    -- Philip


    On Sat, Dec 1, 2012 at 9:06 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    I am not able to start Job tracker.
    Getting this weird error. Can you please help?

    2012-12-02 00:04:28,032 FATAL org.apache.hadoop.mapred.JobTracker:
    java.net.BindException: Problem binding to
    ip-xx-xx-xxx-xx.ec2.internal/xx.xx.xxx.xx:8021 : Address already in use
    at org.apache.hadoop.ipc.Server.bind(Server.java:230)
    at
    org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:319)
    at org.apache.hadoop.ipc.Server.<init>(Server.java:1529)
    at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:539)
    at org.apache.hadoop.ipc.RPC.getServer(RPC.java:500)
    at
    org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2143)
    at
    org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2050)
    at
    org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:296)
    at
    org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:288)
    at
    org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4792)
    Caused by: java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind(Native Method)
    at
    sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
    at
    sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
    at org.apache.hadoop.ipc.Server.bind(Server.java:228)
    ... 9 more

    2012-12-02 00:04:28,040 INFO org.apache.hadoop.mapred.JobTracker:
    SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down JobTracker at
    ip-xx-xx-xxx-xx.ec2.internal/xx.xx.xxx.xx


    Regards,
    Shouvanik



    On Sun, Dec 2, 2012 at 9:28 AM, Philip Zeyliger <
    philip@cloudera.com> wrote:
    Error: Another program is already listening on a port that one
    of our HTTP servers is configured to use. Shut this program down first
    before starting supervisord.

    You've got some other process listening on 9000 or 9001,
    preventing the agent to start. Use "lsof -P -i -n | grep 900[01]" to find
    that process.


    On Sat, Dec 1, 2012 at 7:52 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    I am getting the following errors inside "
    /var/log/cloudera-scm-agent/cloudera-scm-agent.log"

    URLError: <urlopen error [Errno 111] Connection refused>
    [30/Nov/2012 06:47:33 +0000] 24977 TaskTrackerAttemptMonitor
    tasktracker ERROR TaskTracker at http://127.0.0.1:4867 is
    not responding: [Errno socket error] [Errno 111] Connection refused.


    And when I open up
    "/var/log/cloudera-scm-agent/cloudera-scm-agent.out"

    I get the following errors

    [01/Dec/2012 22:46:41 +0000] 14523 MainThread agent
    INFO Logging to /var/log/cloudera-scm-agent/cloudera-scm-agent.log
    Error: Another program is already listening on a port that one of
    our HTTP servers is configured to use. Shut this program down first before
    starting supervisord.


    Please help Philip.



    On Sun, Dec 2, 2012 at 8:02 AM, Philip Zeyliger <
    philip@cloudera.com> wrote:
    Hi Shouvanik,

    Is the agent on your various hosts running? ("service
    cloudera-scm-agent status" or "ps aux | grep agent.py" might tell you). If
    it is, what does its logs say, in /var/log/cloudera-scm-agent?

    -- Philip

    On Fri, Nov 30, 2012 at 11:00 PM, Shouvanik Haldar <
    shouvanik.haldar@gmail.com> wrote:
    Hi Philip,

    Please help me...I am not able to see hosts even after
    installation. Can you please give a check-list as to what all things I need
    to take care of before doing this?

    Regards,
    Shouvanik


    On Saturday, 5 May 2012 07:17:12 UTC+5:30, Philip Zeyliger
    wrote:
    Hi Joost,

    There are situations where the manager will talk to the agent
    over port 9000. In the Free Edition, it'll do that for viewing log files,
    so, yes, you should open it up.

    -- Philip

    On Mon, Apr 30, 2012 at 12:39 AM, Joost den Boer <
    jdb...@diversit.eu> wrote:
    Philip, Harsh,

    Thanks for your quick reply.
    Yes, fixing the /etc/hosts helped.
    Now it contains:
    127.0.0.1 localhost
    192.168.1.221 cdhnode1.diversit.local cdhnode1

    And also changed localhost in /etc/sysconfig/network

    I thought I read somewhere that the 127.0.0.1 had to be
    removed and /etc/hosts should only have one line. That's why I had it the
    way it was.
    In another post I read to do a test with 'host -v -t A
    `hostname`' and that revolved to the correct FQDN so I thought that would
    not be the problem.

    Anyway, thanks again for your help.
    ps. I see the agent opens a port 9000. Is the manager talking
    to the agent over this port? So should this port be opened in iptables?

    Regards,
    Joost


    On Sun, Apr 29, 2012 at 10:01 PM, Philip Zeyliger <
    phi...@cloudera.com> wrote:

    On Sun, Apr 29, 2012 at 1:00 PM, Harsh J <ha...@cloudera.com
    wrote:
    Additionally, I'd like to note that in:

    192.168.1.221 cdhnode1 cdhnode1.diversit.local localhost

    1. The "localhost" seems out of place. Its to go to
    127.0.0.1 alone.
    Ah; I missed this in your original message. This explains
    what you're seeing, and is wrong. localhost should always point to
    127.0.0.1.

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

    --
    Thanks,
    *Shouvanik*

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedDec 2, '12 at 2:32a
activeDec 4, '12 at 3:48p
posts8
users2
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase