FAQ
This doesn't seem possible, but it's occurred twice: services being started
on the wrong port. The bug is transient. I've discovered it when a
Tasktracker instance would not start and the TT log showed:

2012-12-03 18:45:55,729 ERROR org.apache.hadoop.mapred.TaskTracker: *Can
not start task tracker because java.net.BindException: Address already in
use*

That is odd, so I look for the process that is holding the TT port with:

*fuser -n tcp 50060*

...and it shows a Regionserver process holding that port. I think it's
lying, perhaps the way CM starts the process wraps it in a misleading way?
However when I stop the Regionserver, the TT port is free and I can start
the TT as usual. It really appears that a Regionserver is being started on
a Tasktracker port.

Not sure what could be causing this but I've seen it twice recently. CM
4.0.3.


Neil

Search Discussions

  • bc Wong at Dec 4, 2012 at 12:23 am

    On Mon, Dec 3, 2012 at 4:02 PM, Neil Y wrote:

    This doesn't seem possible, but it's occurred twice: services being
    started on the wrong port. The bug is transient. I've discovered it when
    a Tasktracker instance would not start and the TT log showed:

    2012-12-03 18:45:55,729 ERROR org.apache.hadoop.mapred.TaskTracker: *Can
    not start task tracker because java.net.BindException: Address already in
    use*

    That is odd, so I look for the process that is holding the TT port with:

    *fuser -n tcp 50060*

    ...and it shows a Regionserver process holding that port. I think it's
    lying, perhaps the way CM starts the process wraps it in a misleading way?
    However when I stop the Regionserver, the TT port is free and I can start
    the TT as usual. It really appears that a Regionserver is being started on
    a Tasktracker port.

    Not sure what could be causing this but I've seen it twice recently. CM
    4.0.3.
    Hi Neil,

    Some quick reactions:
    * 50060 is right in the ephemeral port range.
    * Is it possible that the RegionServer is using that port for outbound
    traffic? (As opposed to listening on that port.) You can find out with
    netstat/lsof.
    * Go to CM -> HBase -> Config -> Enter "port" in the search box. Is RS
    configured to use 50060 at all?

    If it's an ephemeral port collision, there is no good solution but to
    change your TT port. Hadoop uses poor default values for ports, and CM
    follows those defaults for familiarity. (Maybe it shouldn't.)

    Cheers,
    bc
  • Neil Y at Dec 4, 2012 at 1:12 am
    This is with all-default-CM ports. I'm not sure if the RS was using 50060
    for outbound. I'll try to remember to do a netstat/lsof if I see this
    issue again. But the RS is definitely not configured for it and the XMLs
    in /var/run/cloudera-scm-agent/ do not show anything misconfigured (HBase
    or MR).

    Maybe the old TT Web UI didn't shutdown cleanly? Also curious if the
    agent/supervisor start a process with correct ownership, I recall CM uses
    some tricks for that. Easy to fix once identified though, just restart the
    process holding the "wrong" port.


    Neil
    On Monday, December 3, 2012 7:22:51 PM UTC-5, bc Wong wrote:

    On Mon, Dec 3, 2012 at 4:02 PM, Neil Y <neilya...@gmail.com <javascript:>>wrote:
    This doesn't seem possible, but it's occurred twice: services being
    started on the wrong port. The bug is transient. I've discovered it when
    a Tasktracker instance would not start and the TT log showed:

    2012-12-03 18:45:55,729 ERROR org.apache.hadoop.mapred.TaskTracker: *Can
    not start task tracker because java.net.BindException: Address already in
    use*

    That is odd, so I look for the process that is holding the TT port with:

    *fuser -n tcp 50060*

    ...and it shows a Regionserver process holding that port. I think it's
    lying, perhaps the way CM starts the process wraps it in a misleading way?
    However when I stop the Regionserver, the TT port is free and I can start
    the TT as usual. It really appears that a Regionserver is being started on
    a Tasktracker port.

    Not sure what could be causing this but I've seen it twice recently. CM
    4.0.3.
    Hi Neil,

    Some quick reactions:
    * 50060 is right in the ephemeral port range.
    * Is it possible that the RegionServer is using that port for outbound
    traffic? (As opposed to listening on that port.) You can find out with
    netstat/lsof.
    * Go to CM -> HBase -> Config -> Enter "port" in the search box. Is RS
    configured to use 50060 at all?

    If it's an ephemeral port collision, there is no good solution but to
    change your TT port. Hadoop uses poor default values for ports, and CM
    follows those defaults for familiarity. (Maybe it shouldn't.)

    Cheers,
    bc
  • Neil Y at Dec 4, 2012 at 1:31 am
    I found another node with this issue and did a netstat and lsof--


    # netstat -a | grep 50060
    Proto Recv-Q Send-Q Local Address Foreign Address
    State
    tcp 1 0 storage24.prod:50060 storage24.prod:50010 CLOSE_WAIT

    # lsof | grep 50060
    COMMAND PID USER FD TYPE DEVICE SIZE/OFF
    NODE NAME
    java 11604 hbase 1858u IPv6 772143230 0t0
    TCP storage24.prod:50060->storage24.prod:50010 (CLOSE_WAIT)


    I'm still not sure why hbase user is holding ports intended for
    tasktracker. Odd.

    Neil
    On Monday, December 3, 2012 8:12:24 PM UTC-5, Neil Y wrote:

    This is with all-default-CM ports. I'm not sure if the RS was using 50060
    for outbound. I'll try to remember to do a netstat/lsof if I see this
    issue again. But the RS is definitely not configured for it and the XMLs
    in /var/run/cloudera-scm-agent/ do not show anything misconfigured (HBase
    or MR).

    Maybe the old TT Web UI didn't shutdown cleanly? Also curious if the
    agent/supervisor start a process with correct ownership, I recall CM uses
    some tricks for that. Easy to fix once identified though, just restart the
    process holding the "wrong" port.


    Neil
    On Monday, December 3, 2012 7:22:51 PM UTC-5, bc Wong wrote:
    On Mon, Dec 3, 2012 at 4:02 PM, Neil Y wrote:

    This doesn't seem possible, but it's occurred twice: services being
    started on the wrong port. The bug is transient. I've discovered it when
    a Tasktracker instance would not start and the TT log showed:

    2012-12-03 18:45:55,729 ERROR org.apache.hadoop.mapred.TaskTracker: *Can
    not start task tracker because java.net.BindException: Address already in
    use*

    That is odd, so I look for the process that is holding the TT port with:

    *fuser -n tcp 50060*

    ...and it shows a Regionserver process holding that port. I think it's
    lying, perhaps the way CM starts the process wraps it in a misleading way?
    However when I stop the Regionserver, the TT port is free and I can start
    the TT as usual. It really appears that a Regionserver is being started on
    a Tasktracker port.

    Not sure what could be causing this but I've seen it twice recently. CM
    4.0.3.
    Hi Neil,

    Some quick reactions:
    * 50060 is right in the ephemeral port range.
    * Is it possible that the RegionServer is using that port for outbound
    traffic? (As opposed to listening on that port.) You can find out with
    netstat/lsof.
    * Go to CM -> HBase -> Config -> Enter "port" in the search box. Is RS
    configured to use 50060 at all?

    If it's an ephemeral port collision, there is no good solution but to
    change your TT port. Hadoop uses poor default values for ports, and CM
    follows those defaults for familiarity. (Maybe it shouldn't.)

    Cheers,
    bc
  • bc Wong at Dec 4, 2012 at 2:35 am

    On Mon, Dec 3, 2012 at 5:31 PM, Neil Y wrote:

    I found another node with this issue and did a netstat and lsof--


    # netstat -a | grep 50060
    Proto Recv-Q Send-Q Local Address Foreign Address
    State
    tcp 1 0 storage24.prod:50060 storage24.prod:50010 CLOSE_WAIT

    # lsof | grep 50060
    COMMAND PID USER FD TYPE DEVICE SIZE/OFF
    NODE NAME
    java 11604 hbase 1858u IPv6 772143230 0t0
    TCP storage24.prod:50060->storage24.prod:50010 (CLOSE_WAIT)


    I'm still not sure why hbase user is holding ports intended for
    tasktracker. Odd.
    So it's an ephemeral port collision. The RS is using 50060 to talk to
    (probably) a DN that's listening on 50010 (the default xceiver port). I
    mean, it has to use some port. It happens to pick 50060 because it's there,
    totally unaware that the TT needs it.

    You can:
    * Always start your MR service before HBase.
    * Or reconfig the daemons to avoid using ports in the ephemeral range.
    * Or change the port range on your host. See
    `cat /proc/sys/net/ipv4/ip_local_port_range'.

    Cheers,
    bc
    On Monday, December 3, 2012 8:12:24 PM UTC-5, Neil Y wrote:

    This is with all-default-CM ports. I'm not sure if the RS was using
    50060 for outbound. I'll try to remember to do a netstat/lsof if I see
    this issue again. But the RS is definitely not configured for it and the
    XMLs in /var/run/cloudera-scm-agent/ do not show anything misconfigured
    (HBase or MR).

    Maybe the old TT Web UI didn't shutdown cleanly? Also curious if the
    agent/supervisor start a process with correct ownership, I recall CM uses
    some tricks for that. Easy to fix once identified though, just restart the
    process holding the "wrong" port.


    Neil
    On Monday, December 3, 2012 7:22:51 PM UTC-5, bc Wong wrote:
    On Mon, Dec 3, 2012 at 4:02 PM, Neil Y wrote:

    This doesn't seem possible, but it's occurred twice: services being
    started on the wrong port. The bug is transient. I've discovered it when
    a Tasktracker instance would not start and the TT log showed:

    2012-12-03 18:45:55,729 ERROR org.apache.hadoop.mapred.**TaskTracker: *Can
    not start task tracker because java.net.BindException: Address already in
    use*

    That is odd, so I look for the process that is holding the TT port with:

    *fuser -n tcp 50060*

    ...and it shows a Regionserver process holding that port. I think it's
    lying, perhaps the way CM starts the process wraps it in a misleading way?
    However when I stop the Regionserver, the TT port is free and I can start
    the TT as usual. It really appears that a Regionserver is being started on
    a Tasktracker port.

    Not sure what could be causing this but I've seen it twice recently.
    CM 4.0.3.
    Hi Neil,

    Some quick reactions:
    * 50060 is right in the ephemeral port range.
    * Is it possible that the RegionServer is using that port for outbound
    traffic? (As opposed to listening on that port.) You can find out with
    netstat/lsof.
    * Go to CM -> HBase -> Config -> Enter "port" in the search box. Is RS
    configured to use 50060 at all?

    If it's an ephemeral port collision, there is no good solution but to
    change your TT port. Hadoop uses poor default values for ports, and CM
    follows those defaults for familiarity. (Maybe it shouldn't.)

    Cheers,
    bc

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedDec 4, '12 at 12:02a
activeDec 4, '12 at 2:35a
posts5
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Neil Y: 3 posts bc Wong: 2 posts

People

Translate

site design / logo © 2022 Grokbase