FAQ
[hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
--------------------------------------------------------------------------

Key: HADOOP-2017
URL: https://issues.apache.org/jira/browse/HADOOP-2017
Project: Hadoop
Issue Type: Bug
Components: contrib/hbase
Reporter: stack
Priority: Minor


In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).

In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • stack (JIRA) at Oct 9, 2007 at 8:03 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533478 ]

    stack commented on HADOOP-2017:
    -------------------------------

    Nightly #263 also failed on TRSA in same manner as patch build #903
    [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
    --------------------------------------------------------------------------

    Key: HADOOP-2017
    URL: https://issues.apache.org/jira/browse/HADOOP-2017
    Project: Hadoop
    Issue Type: Bug
    Components: contrib/hbase
    Reporter: stack
    Priority: Minor

    In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
    In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Oct 9, 2007 at 8:17 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HADOOP-2017:
    --------------------------

    Attachment: trsa.patch

    A patch w/ more logging and thread dumping to better help what is going on, and a mechanism that notices moved regions sooner.

    {code}
    HADOOP-2017 TestRegionServerAbort failure in patch build #903 and nightly #266

    Notice moved META regions sooner. Also added more logging and
    thread dumping once a minute when test starts to take too long
    so can see where we are hung (if we are hung).

    M src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestHStoreFile.java
    Inherit from HBaseTestCase.
    M src/contrib/hbase/src/test/org/apache/hadoop/hbase/HBaseClusterTestCase.java
    (threadDumpingJoin): Added.
    M src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestRegionServerAbort.java
    Run verification in its own thread so can concurrently thread dump if
    test is going on too long.
    M src/contrib/hbase/src/test/org/apache/hadoop/hbase/DFSAbort.java
    Moved join up into parent class.
    M src/contrib/hbase/src/java/org/apache/hadoop/hbase/Chore.java
    Remove unused import.
    M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMaster.java
    (MetaRegion.toString): Added.
    Added logging around assignment checking and log split.
    (MetaRegion.compareTo): Add consideration of server address.
    (numberOfMetaRegions, metaRegionsToScan, onlineMetaRegions):
    Put declaration and assignment together and made final.
    (scanOneMetaRegion): If the region is no longer in onlineMetaRegions,
    give up trying to scan.
    (unassignRootRegion): Added (Not yet finished).
    {code}
    [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
    --------------------------------------------------------------------------

    Key: HADOOP-2017
    URL: https://issues.apache.org/jira/browse/HADOOP-2017
    Project: Hadoop
    Issue Type: Bug
    Components: contrib/hbase
    Reporter: stack
    Priority: Minor
    Fix For: 0.15.0

    Attachments: trsa.patch


    In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
    In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Oct 9, 2007 at 8:17 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HADOOP-2017:
    --------------------------

    Fix Version/s: 0.15.0
    Status: Patch Available (was: Open)

    Builds locally.
    [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
    --------------------------------------------------------------------------

    Key: HADOOP-2017
    URL: https://issues.apache.org/jira/browse/HADOOP-2017
    Project: Hadoop
    Issue Type: Bug
    Components: contrib/hbase
    Reporter: stack
    Priority: Minor
    Fix For: 0.15.0

    Attachments: trsa.patch


    In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
    In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Oct 9, 2007 at 9:22 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533514 ]

    Hadoop QA commented on HADOOP-2017:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12367392/trsa.patch
    against trunk revision r583037.

    @author +1. The patch does not contain any @author tags.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new compiler warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/910/testReport/
    Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/910/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/910/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/910/console

    This message is automatically generated.
    [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
    --------------------------------------------------------------------------

    Key: HADOOP-2017
    URL: https://issues.apache.org/jira/browse/HADOOP-2017
    Project: Hadoop
    Issue Type: Bug
    Components: contrib/hbase
    Reporter: stack
    Priority: Minor
    Fix For: 0.15.0

    Attachments: trsa.patch


    In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
    In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Oct 9, 2007 at 10:13 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HADOOP-2017:
    --------------------------

    Fix Version/s: (was: 0.15.0)
    [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
    --------------------------------------------------------------------------

    Key: HADOOP-2017
    URL: https://issues.apache.org/jira/browse/HADOOP-2017
    Project: Hadoop
    Issue Type: Bug
    Components: contrib/hbase
    Reporter: stack
    Assignee: stack
    Priority: Minor
    Attachments: trsa.patch


    In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
    In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Oct 9, 2007 at 10:13 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack reassigned HADOOP-2017:
    -----------------------------

    Assignee: stack
    [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
    --------------------------------------------------------------------------

    Key: HADOOP-2017
    URL: https://issues.apache.org/jira/browse/HADOOP-2017
    Project: Hadoop
    Issue Type: Bug
    Components: contrib/hbase
    Reporter: stack
    Assignee: stack
    Priority: Minor
    Attachments: trsa.patch


    In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
    In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Oct 9, 2007 at 10:15 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533531 ]

    stack commented on HADOOP-2017:
    -------------------------------

    Applied patch. Now waiting to see if problem occurs again. If so, extra logging and thread dumps should help.
    [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
    --------------------------------------------------------------------------

    Key: HADOOP-2017
    URL: https://issues.apache.org/jira/browse/HADOOP-2017
    Project: Hadoop
    Issue Type: Bug
    Components: contrib/hbase
    Reporter: stack
    Assignee: stack
    Priority: Minor
    Attachments: trsa.patch


    In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
    In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Oct 10, 2007 at 12:26 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533722 ]

    Hudson commented on HADOOP-2017:
    --------------------------------

    Integrated in Hadoop-Nightly #267 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/267/])
    [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
    --------------------------------------------------------------------------

    Key: HADOOP-2017
    URL: https://issues.apache.org/jira/browse/HADOOP-2017
    Project: Hadoop
    Issue Type: Bug
    Components: contrib/hbase
    Reporter: stack
    Assignee: stack
    Priority: Minor
    Attachments: trsa.patch


    In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
    In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Oct 12, 2007 at 9:38 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack updated HADOOP-2017:
    --------------------------

    Resolution: Fixed
    Fix Version/s: 0.15.0
    Status: Resolved (was: Patch Available)

    Hasn't recurred since commit. HADOOP-2038 should also makes this issue less likely. Also, this test has been moved into TestRegionServerExit. Resolving as fixed.
    [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
    --------------------------------------------------------------------------

    Key: HADOOP-2017
    URL: https://issues.apache.org/jira/browse/HADOOP-2017
    Project: Hadoop
    Issue Type: Bug
    Components: contrib/hbase
    Reporter: stack
    Assignee: stack
    Priority: Minor
    Fix For: 0.15.0

    Attachments: trsa.patch


    In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
    In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedOct 9, '07 at 7:53p
activeOct 12, '07 at 9:38p
posts10
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

stack (JIRA): 10 posts

People

Translate

site design / logo © 2022 Grokbase