FAQ
During a general routine MapReduce job, my map tasks start to fail on only
one regionserver until the entire job fails. The error tasks are getting
is related to Lease Timeouts, so I decided to raise my timeout to 600
seconds (via CDH Manager) and ran it again. I once again found the same
issue persisting.

I investigated the regionserver logs and found something very odd:

Aborting call [...truncated...] after 61741 ms, since caller disconnected


Notice that the call is being aborted after only 60 seconds. 1/10 the time
I had just set as the lease timeout.

I looked inside my hbase-site.xml in the path "/etc/hbase/conf" to find
that no lease.period was actually set there.

Am I missing something? Or is this a bug of sorts?

Environment:

Cloudera Manager 4.5.2 (#327 built by jenkins on 20130429-1453 git:
16cab2c7b76194b7877d64a4215494daa387a266)
CDH 4.2.1-1.cdh4.2.1.p0.5

Search Discussions

  • Darren Lo at Jul 3, 2013 at 4:28 pm
    Hi Bryan,

    Did you restart the HBase after making the config change? Does modifying
    /etc/hbase/conf to have the timeout fix the problem?

    Thanks,
    Darren

    On Wed, Jul 3, 2013 at 8:22 AM, wrote:

    During a general routine MapReduce job, my map tasks start to fail on only
    one regionserver until the entire job fails. The error tasks are getting
    is related to Lease Timeouts, so I decided to raise my timeout to 600
    seconds (via CDH Manager) and ran it again. I once again found the same
    issue persisting.

    I investigated the regionserver logs and found something very odd:

    Aborting call [...truncated...] after 61741 ms, since caller disconnected


    Notice that the call is being aborted after only 60 seconds. 1/10 the time
    I had just set as the lease timeout.

    I looked inside my hbase-site.xml in the path "/etc/hbase/conf" to find
    that no lease.period was actually set there.

    Am I missing something? Or is this a bug of sorts?

    Environment:

    Cloudera Manager 4.5.2 (#327 built by jenkins on 20130429-1453 git:
    16cab2c7b76194b7877d64a4215494daa387a266)
    CDH 4.2.1-1.cdh4.2.1.p0.5

    --
    Thanks,
    Darren
  • Bryan at Jul 3, 2013 at 7:57 pm
    Yes, I restarted HBase. And yes, I did edit the file (and restart), which
    still presented the problem. I looked at the regionserver's hbase
    configuration (via http) and it was correctly set there.

    <property>
    <name>hbase.regionserver.lease.period</name>
    <value>600000</value>
    <source>hbase-site.xml</source>
    </property>

    The only place where I see a differing lease.period is in the HMaster and
    I'm not even sure why that would be the case or why that would only affect
    1 regionserver.
    On Wednesday, July 3, 2013 11:22:10 AM UTC-4, [email protected] wrote:

    During a general routine MapReduce job, my map tasks start to fail on only
    one regionserver until the entire job fails. The error tasks are getting
    is related to Lease Timeouts, so I decided to raise my timeout to 600
    seconds (via CDH Manager) and ran it again. I once again found the same
    issue persisting.

    I investigated the regionserver logs and found something very odd:

    Aborting call [...truncated...] after 61741 ms, since caller disconnected


    Notice that the call is being aborted after only 60 seconds. 1/10 the time
    I had just set as the lease timeout.

    I looked inside my hbase-site.xml in the path "/etc/hbase/conf" to find
    that no lease.period was actually set there.

    Am I missing something? Or is this a bug of sorts?

    Environment:

    Cloudera Manager 4.5.2 (#327 built by jenkins on 20130429-1453 git:
    16cab2c7b76194b7877d64a4215494daa387a266)
    CDH 4.2.1-1.cdh4.2.1.p0.5
  • Darren Lo at Jul 3, 2013 at 11:09 pm
    Can you try putting your lease period XML into "HBase Client Configuration
    Safety Valve for hbase-site.xml", then deploying client configuration
    (which will update etc/hbase/conf) and re-trying your job?

    It's certainly strange that only 1 regionserver is affected, unless that
    regionserver is running really slowly. Try to check that as well.

    Thanks,
    Darren

    On Wed, Jul 3, 2013 at 12:57 PM, wrote:


    Yes, I restarted HBase. And yes, I did edit the file (and restart), which
    still presented the problem. I looked at the regionserver's hbase
    configuration (via http) and it was correctly set there.

    <property>
    <name>hbase.regionserver.lease.period</name>
    <value>600000</value>
    <source>hbase-site.xml</source>
    </property>

    The only place where I see a differing lease.period is in the HMaster and
    I'm not even sure why that would be the case or why that would only affect
    1 regionserver.
    On Wednesday, July 3, 2013 11:22:10 AM UTC-4, [email protected] wrote:

    During a general routine MapReduce job, my map tasks start to fail on
    only one regionserver until the entire job fails. The error tasks are
    getting is related to Lease Timeouts, so I decided to raise my timeout to
    600 seconds (via CDH Manager) and ran it again. I once again found the same
    issue persisting.

    I investigated the regionserver logs and found something very odd:

    Aborting call [...truncated...] after 61741 ms, since caller disconnected


    Notice that the call is being aborted after only 60 seconds. 1/10 the
    time I had just set as the lease timeout.

    I looked inside my hbase-site.xml in the path "/etc/hbase/conf" to find
    that no lease.period was actually set there.

    Am I missing something? Or is this a bug of sorts?

    Environment:

    Cloudera Manager 4.5.2 (#327 built by jenkins on 20130429-1453 git:
    16cab2c7b76194b7877d64a4215494**daa387a266)
    CDH 4.2.1-1.cdh4.2.1.p0.5

    --
    Thanks,
    Darren
  • bc Wong at Jul 3, 2013 at 11:18 pm

    On Wed, Jul 3, 2013 at 8:22 AM, wrote:
    During a general routine MapReduce job, my map tasks start to fail on only
    one regionserver until the entire job fails. The error tasks are getting is
    related to Lease Timeouts, so I decided to raise my timeout to 600 seconds
    (via CDH Manager) and ran it again. I once again found the same issue
    persisting.

    I investigated the regionserver logs and found something very odd:
    Aborting call [...truncated...] after 61741 ms, since caller disconnected
    I don't think this has to do with your server-side config, because
    it's the client that is disconnecting. What is your
    `hbase.rpc.timeout' set to?

    Cheers,
    bc
    Notice that the call is being aborted after only 60 seconds. 1/10 the time I
    had just set as the lease timeout.

    I looked inside my hbase-site.xml in the path "/etc/hbase/conf" to find that
    no lease.period was actually set there.

    Am I missing something? Or is this a bug of sorts?

    Environment:

    Cloudera Manager 4.5.2 (#327 built by jenkins on 20130429-1453 git:
    16cab2c7b76194b7877d64a4215494daa387a266)
    CDH 4.2.1-1.cdh4.2.1.p0.5
  • Bryan at Jul 5, 2013 at 7:52 pm
    My current rpc.timeout is currently set to 60 seconds. I'll see how that
    affects my issues.
    On Wednesday, July 3, 2013 7:17:53 PM UTC-4, bc Wong wrote:

    On Wed, Jul 3, 2013 at 8:22 AM, <[email protected] <javascript:>>
    wrote:
    During a general routine MapReduce job, my map tasks start to fail on only
    one regionserver until the entire job fails. The error tasks are
    getting is
    related to Lease Timeouts, so I decided to raise my timeout to 600 seconds
    (via CDH Manager) and ran it again. I once again found the same issue
    persisting.

    I investigated the regionserver logs and found something very odd:
    Aborting call [...truncated...] after 61741 ms, since caller
    disconnected

    I don't think this has to do with your server-side config, because
    it's the client that is disconnecting. What is your
    `hbase.rpc.timeout' set to?

    Cheers,
    bc
    Notice that the call is being aborted after only 60 seconds. 1/10 the time I
    had just set as the lease timeout.

    I looked inside my hbase-site.xml in the path "/etc/hbase/conf" to find that
    no lease.period was actually set there.

    Am I missing something? Or is this a bug of sorts?

    Environment:

    Cloudera Manager 4.5.2 (#327 built by jenkins on 20130429-1453 git:
    16cab2c7b76194b7877d64a4215494daa387a266)
    CDH 4.2.1-1.cdh4.2.1.p0.5

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedJul 3, '13 at 3:22p
activeJul 5, '13 at 7:52p
posts6
users3
websitecloudera.com
irc#hadoop

3 users in discussion

Bryan: 3 posts Darren Lo: 2 posts bc Wong: 1 post

People

Translate

site design / logo © 2023 Grokbase