Grokbase Groups HBase dev June 2011
FAQ
We had a QA cluster which got left on for a while during some
maintenance to DNS/etc in our colo... everything is fine in the RS
logs until:

2011-05-14 23:11:46,154 ERROR org.apache.hadoop.hbase.HServerAddress:
Could not resolve the DNS name of c0505.hal.cloudera.com:60000
2011-05-14 23:11:46,154 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1
java.lang.IllegalArgumentException: Could not resolve the DNS name of
c0505.hal.cloudera.com:60000
at org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
at org.apache.hadoop.hbase.HServerAddress.(MasterAddressTracker.java:63)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getMasterAddress(HRegionServer.java:1469)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1442)
at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:742)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:591)
at java.lang.Thread.run(Thread.java:619)
2011-05-14 23:12:14,175 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect
to Master server at c0505.hal.cloudera.com:60000
2011-05-14 23:12:14,177 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to
master at c0505.hal.cloudera.com:60000
2011-05-14 23:12:14,178 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect
to Master server at c0505.hal.cloudera.com:60000
2011-05-14 23:12:14,179 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to
master at c0505.hal.cloudera.com:60000
followed by many GB of the above two messages alternating.

This is something close to an 0.90.1 plus a few patches here and
there... this ring a bell for anyone or should I dig? Looks like in
trunk it's mostly rewritten by HBASE-3827/HBASE-1502. I do have
HBASE-3545 in the build.

--
Todd Lipcon
Software Engineer, Cloudera

Search Discussions

  • Stack at Jun 1, 2011 at 3:00 am
    Like you say, it should be gone in 0.92.x.

    On each regionserver report, we'd deserialize an HServerAddress
    instance. As part of deserialize, we'd make an InetSocketAddress
    instance. This act of creation would do a resolve. In HSA
    constructor, if InetSocketAddress failed resolve, we'd throw the below
    IllegalArgumentException.

    Not sure what you can do about it in 0.90.x w/o major surgery. I
    suppose you could just catch the exception and drop the report on the
    ground until resolve works again.

    St.Ack
    On Tue, May 31, 2011 at 7:21 PM, Todd Lipcon wrote:
    We had a QA cluster which got left on for a while during some
    maintenance to DNS/etc in our colo... everything is fine in the RS
    logs until:

    2011-05-14 23:11:46,154 ERROR org.apache.hadoop.hbase.HServerAddress:
    Could not resolve the DNS name of c0505.hal.cloudera.com:60000
    2011-05-14 23:11:46,154 WARN
    org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1
    java.lang.IllegalArgumentException: Could not resolve the DNS name of
    c0505.hal.cloudera.com:60000
    at org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
    at org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:66)
    at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:63)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.getMasterAddress(HRegionServer.java:1469)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1442)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:742)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:591)
    at java.lang.Thread.run(Thread.java:619)
    2011-05-14 23:12:14,175 INFO
    org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect
    to Master server at c0505.hal.cloudera.com:60000
    2011-05-14 23:12:14,177 INFO
    org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to
    master at c0505.hal.cloudera.com:60000
    2011-05-14 23:12:14,178 INFO
    org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect
    to Master server at c0505.hal.cloudera.com:60000
    2011-05-14 23:12:14,179 INFO
    org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to
    master at c0505.hal.cloudera.com:60000
    followed by many GB of the above two messages alternating.

    This is something close to an 0.90.1 plus a few patches here and
    there... this ring a bell for anyone or should I dig? Looks like in
    trunk it's mostly rewritten by HBASE-3827/HBASE-1502. I do have
    HBASE-3545 in the build.

    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Todd Lipcon at Jun 1, 2011 at 10:21 pm

    On Tue, May 31, 2011 at 7:59 PM, Stack wrote:

    Like you say, it should be gone in 0.92.x.

    On each regionserver report, we'd deserialize an HServerAddress
    instance. As part of deserialize, we'd make an InetSocketAddress
    instance. This act of creation would do a resolve. In HSA
    constructor, if InetSocketAddress failed resolve, we'd throw the below
    IllegalArgumentException.

    Not sure what you can do about it in 0.90.x w/o major surgery. I
    suppose you could just catch the exception and drop the report on the
    ground until resolve works again.
    Yea, the question is why it got into a tight loop retrying instead of either
    (a) sleeping between retries, or (b) shutting down after some number of
    retries.

    The code looks like it's supposed to do (a), but the log messages are only a
    few millis apart.

    On Tue, May 31, 2011 at 7:21 PM, Todd Lipcon wrote:
    We had a QA cluster which got left on for a while during some
    maintenance to DNS/etc in our colo... everything is fine in the RS
    logs until:

    2011-05-14 23:11:46,154 ERROR org.apache.hadoop.hbase.HServerAddress:
    Could not resolve the DNS name of c0505.hal.cloudera.com:60000
    2011-05-14 23:11:46,154 WARN
    org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1
    java.lang.IllegalArgumentException: Could not resolve the DNS name of
    c0505.hal.cloudera.com:60000
    at
    org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
    at
    org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:66)
    at
    org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:63)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.getMasterAddress(HRegionServer.java:1469)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1442)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:742)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:591)
    at java.lang.Thread.run(Thread.java:619)
    2011-05-14 23:12:14,175 INFO
    org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect
    to Master server at c0505.hal.cloudera.com:60000
    2011-05-14 23:12:14,177 INFO
    org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to
    master at c0505.hal.cloudera.com:60000
    2011-05-14 23:12:14,178 INFO
    org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect
    to Master server at c0505.hal.cloudera.com:60000
    2011-05-14 23:12:14,179 INFO
    org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to
    master at c0505.hal.cloudera.com:60000
    followed by many GB of the above two messages alternating.

    This is something close to an 0.90.1 plus a few patches here and
    there... this ring a bell for anyone or should I dig? Looks like in
    trunk it's mostly rewritten by HBASE-3827/HBASE-1502. I do have
    HBASE-3545 in the build.

    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Todd Lipcon
    Software Engineer, Cloudera

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJun 1, '11 at 2:22a
activeJun 1, '11 at 10:21p
posts3
users2
websitehbase.apache.org

2 users in discussion

Todd Lipcon: 2 posts Stack: 1 post

People

Translate

site design / logo © 2022 Grokbase