Grokbase Groups HBase dev July 2012
FAQ
Hey devs,

I encountered an "interesting" situation with hbck in 0.94, we had
this region which was on HDFS that wasn't in .META. and hbck decided
to include it back:

ERROR: Region { meta => null, hdfs =>
hdfs://sfor3s24:10101/hbase/url_stumble_summary/159952764, deployed =>
} on HDFS, but not listed in META or deployed on any region server
12/07/17 23:46:03 INFO util.HBaseFsck: Patching .META. with
.regioninfo: {NAME =>
'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
'25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
159952764,}

Then when it tried to assign the region it got bounced between region servers:

Trying to reassign region...
12/07/17 23:46:04 INFO util.HBaseFsckRepair: Region still in
transition, waiting for it to become assigned: {NAME =>
'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
'25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
159952764,}
12/07/17 23:46:05 INFO util.HBaseFsckRepair: Region still in
transition, waiting for it to become assigned: {NAME =>
'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
'25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
159952764,}
etc

Turns out that this region only contained references (as in post-split
references) to a region that didn't exist anymore so when the region
was being opened it was failing on opening those referenced files:

2012-07-18 00:00:27,454 ERROR
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
open of region=url_stumble_summary,25467315:2009-12-28,1271922074820.159952764,
starting to roll back the global memstore size.
java.io.IOException: java.io.IOException:
java.io.FileNotFoundException: File does not exist:
/hbase/url_stumble_summary/208247386/default/2354161894779228084
at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:550)
at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:463)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3729)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3677)
...
Caused by: java.io.IOException: java.io.FileNotFoundException: File
does not exist:
/hbase/url_stumble_summary/208247386/default/2354161894779228084
at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:405)
at org.apache.hadoop.hbase.regionserver.Store.(HRegion.java:2918)
...
Caused by: java.io.FileNotFoundException: File does not exist:
/hbase/url_stumble_summary/208247386/default/2354161894779228084
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.(DFSClient.java:544)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:102)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:547)
at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.(HalfStoreFileReader.java:66)
...


At first it was confusing me why it was looking for another region
until I saw the HalfStoreFileReader :)

So this is a case where hbck made the cluster worse because the only
way to get rid of this region is to force unassign it, delete it from
.META. and then possibly also delete it from HDFS.

I'm wondering how this could be done better, should we do more checks
when including that sort of region? Like, at least make sure we can
open it? And then what? Just report it?

Thx for reading this far,

J-D

Search Discussions

  • Ted Yu at Jul 18, 2012 at 5:18 pm
    Adding check on whether the referenced files can be found would help.
    If any of the referenced files isn't found, report and don't repair.

    Cheers
    On Wed, Jul 18, 2012 at 8:53 AM, Jean-Daniel Cryans wrote:

    Hey devs,

    I encountered an "interesting" situation with hbck in 0.94, we had
    this region which was on HDFS that wasn't in .META. and hbck decided
    to include it back:

    ERROR: Region { meta => null, hdfs =>
    hdfs://sfor3s24:10101/hbase/url_stumble_summary/159952764, deployed =>
    } on HDFS, but not listed in META or deployed on any region server
    12/07/17 23:46:03 INFO util.HBaseFsck: Patching .META. with
    .regioninfo: {NAME =>
    'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
    '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
    159952764,}

    Then when it tried to assign the region it got bounced between region
    servers:

    Trying to reassign region...
    12/07/17 23:46:04 INFO util.HBaseFsckRepair: Region still in
    transition, waiting for it to become assigned: {NAME =>
    'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
    '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
    159952764,}
    12/07/17 23:46:05 INFO util.HBaseFsckRepair: Region still in
    transition, waiting for it to become assigned: {NAME =>
    'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
    '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
    159952764,}
    etc

    Turns out that this region only contained references (as in post-split
    references) to a region that didn't exist anymore so when the region
    was being opened it was failing on opening those referenced files:

    2012-07-18 00:00:27,454 ERROR
    org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
    open of
    region=url_stumble_summary,25467315:2009-12-28,1271922074820.159952764,
    starting to roll back the global memstore size.
    java.io.IOException: java.io.IOException:
    java.io.FileNotFoundException: File does not exist:
    /hbase/url_stumble_summary/208247386/default/2354161894779228084
    at
    org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:550)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:463)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3729)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3677)
    ...
    Caused by: java.io.IOException: java.io.FileNotFoundException: File
    does not exist:
    /hbase/url_stumble_summary/208247386/default/2354161894779228084
    at
    org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:405)
    at
    org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:258)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2918)
    ...
    Caused by: java.io.FileNotFoundException: File does not exist:
    /hbase/url_stumble_summary/208247386/default/2354161894779228084
    at
    org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1813)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
    at
    org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:102)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
    at
    org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:547)
    at
    org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1252)
    at
    org.apache.hadoop.hbase.io.HalfStoreFileReader.<init>(HalfStoreFileReader.java:66)
    ...


    At first it was confusing me why it was looking for another region
    until I saw the HalfStoreFileReader :)

    So this is a case where hbck made the cluster worse because the only
    way to get rid of this region is to force unassign it, delete it from
    .META. and then possibly also delete it from HDFS.

    I'm wondering how this could be done better, should we do more checks
    when including that sort of region? Like, at least make sure we can
    open it? And then what? Just report it?

    Thx for reading this far,

    J-D
  • Ramkrishna.S.Vasudevan at Jul 19, 2012 at 4:53 am
    J-D

    Just going thro the explanation I feel that the region that had references
    is a parent region and it should have an entry in META saying it is SPLIT
    and OFFLINE?

    May be while fixing those cases where we find something in HDFS and not in
    META we may need see if it is splitted?

    Was there any reason why the CatalogJanitor was not able to pick this region
    for clean up.

    I may be wrong here JD, just going thro the explanation am thinking this
    could be the scenario.

    Thanks for bringing this up, would add this to our internal testing also.

    Regards
    Ram

    -----Original Message-----
    From: jdcryans@gmail.com On Behalf Of Jean-
    Daniel Cryans
    Sent: Wednesday, July 18, 2012 9:23 PM
    To: dev@hbase.apache.org
    Subject: Wondering what hbck should do in this situation

    Hey devs,

    I encountered an "interesting" situation with hbck in 0.94, we had
    this region which was on HDFS that wasn't in .META. and hbck decided
    to include it back:

    ERROR: Region { meta => null, hdfs =>
    hdfs://sfor3s24:10101/hbase/url_stumble_summary/159952764, deployed =>
    } on HDFS, but not listed in META or deployed on any region server
    12/07/17 23:46:03 INFO util.HBaseFsck: Patching .META. with
    .regioninfo: {NAME =>
    'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
    '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
    159952764,}

    Then when it tried to assign the region it got bounced between region
    servers:

    Trying to reassign region...
    12/07/17 23:46:04 INFO util.HBaseFsckRepair: Region still in
    transition, waiting for it to become assigned: {NAME =>
    'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
    '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
    159952764,}
    12/07/17 23:46:05 INFO util.HBaseFsckRepair: Region still in
    transition, waiting for it to become assigned: {NAME =>
    'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
    '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
    159952764,}
    etc

    Turns out that this region only contained references (as in post-split
    references) to a region that didn't exist anymore so when the region
    was being opened it was failing on opening those referenced files:

    2012-07-18 00:00:27,454 ERROR
    org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
    open of region=url_stumble_summary,25467315:2009-12-
    28,1271922074820.159952764,
    starting to roll back the global memstore size.
    java.io.IOException: java.io.IOException:
    java.io.FileNotFoundException: File does not exist:
    /hbase/url_stumble_summary/208247386/default/2354161894779228084
    at
    org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(
    HRegion.java:550)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:46
    3)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3
    729)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3
    677)
    ...
    Caused by: java.io.IOException: java.io.FileNotFoundException: File
    does not exist:
    /hbase/url_stumble_summary/208247386/default/2354161894779228084
    at
    org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:40
    5)
    at
    org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:258)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.
    java:2918)
    ...
    Caused by: java.io.FileNotFoundException: File does not exist:
    /hbase/url_stumble_summary/208247386/default/2354161894779228084
    at
    org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java
    :1822)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1
    813)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem
    .java:187)
    at
    org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:102)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
    at
    org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.j
    ava:547)
    at
    org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.
    java:1252)
    at
    org.apache.hadoop.hbase.io.HalfStoreFileReader.<init>(HalfStoreFileRead
    er.java:66)
    ...


    At first it was confusing me why it was looking for another region
    until I saw the HalfStoreFileReader :)

    So this is a case where hbck made the cluster worse because the only
    way to get rid of this region is to force unassign it, delete it from
    .META. and then possibly also delete it from HDFS.

    I'm wondering how this could be done better, should we do more checks
    when including that sort of region? Like, at least make sure we can
    open it? And then what? Just report it?

    Thx for reading this far,

    J-D
  • Ramkrishna.S.Vasudevan at Jul 19, 2012 at 4:59 am
    J-d
    Corrections, if META does not have an entry then we cannot know if it is
    splitted or not.. Apologies for that.

    I think we need to check for Reference files and if the opening fails we
    need to report it. That should be the way.
    But we should also confirm whether this region was split properly, right?

    Regards
    Ram
    -----Original Message-----
    From: Ramkrishna.S.Vasudevan
    Sent: Thursday, July 19, 2012 10:21 AM
    To: 'dev@hbase.apache.org'
    Subject: RE: Wondering what hbck should do in this situation

    J-D

    Just going thro the explanation I feel that the region that had
    references is a parent region and it should have an entry in META
    saying it is SPLIT and OFFLINE?

    May be while fixing those cases where we find something in HDFS and not
    in META we may need see if it is splitted?

    Was there any reason why the CatalogJanitor was not able to pick this
    region for clean up.

    I may be wrong here JD, just going thro the explanation am thinking
    this could be the scenario.

    Thanks for bringing this up, would add this to our internal testing
    also.

    Regards
    Ram

    -----Original Message-----
    From: jdcryans@gmail.com On Behalf Of Jean-
    Daniel Cryans
    Sent: Wednesday, July 18, 2012 9:23 PM
    To: dev@hbase.apache.org
    Subject: Wondering what hbck should do in this situation

    Hey devs,

    I encountered an "interesting" situation with hbck in 0.94, we had
    this region which was on HDFS that wasn't in .META. and hbck decided
    to include it back:

    ERROR: Region { meta => null, hdfs =>
    hdfs://sfor3s24:10101/hbase/url_stumble_summary/159952764, deployed =>
    } on HDFS, but not listed in META or deployed on any region server
    12/07/17 23:46:03 INFO util.HBaseFsck: Patching .META. with
    .regioninfo: {NAME =>
    'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
    '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
    159952764,}

    Then when it tried to assign the region it got bounced between region
    servers:

    Trying to reassign region...
    12/07/17 23:46:04 INFO util.HBaseFsckRepair: Region still in
    transition, waiting for it to become assigned: {NAME =>
    'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
    '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
    159952764,}
    12/07/17 23:46:05 INFO util.HBaseFsckRepair: Region still in
    transition, waiting for it to become assigned: {NAME =>
    'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
    '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
    159952764,}
    etc

    Turns out that this region only contained references (as in post- split
    references) to a region that didn't exist anymore so when the region
    was being opened it was failing on opening those referenced files:

    2012-07-18 00:00:27,454 ERROR
    org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
    open of region=url_stumble_summary,25467315:2009-12-
    28,1271922074820.159952764,
    starting to roll back the global memstore size.
    java.io.IOException: java.io.IOException:
    java.io.FileNotFoundException: File does not exist:
    /hbase/url_stumble_summary/208247386/default/2354161894779228084
    at
    org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(
    HRegion.java:550)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:46
    3)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3
    729)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3
    677)
    ...
    Caused by: java.io.IOException: java.io.FileNotFoundException: File
    does not exist:
    /hbase/url_stumble_summary/208247386/default/2354161894779228084
    at
    org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:40
    5)
    at
    org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:258)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.
    java:2918)
    ...
    Caused by: java.io.FileNotFoundException: File does not exist:
    /hbase/url_stumble_summary/208247386/default/2354161894779228084
    at
    org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java
    :1822)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1
    813)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem
    .java:187)
    at
    org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:102)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
    at
    org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.j
    ava:547)
    at
    org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.
    java:1252)
    at
    org.apache.hadoop.hbase.io.HalfStoreFileReader.<init>(HalfStoreFileRead
    er.java:66)
    ...


    At first it was confusing me why it was looking for another region
    until I saw the HalfStoreFileReader :)

    So this is a case where hbck made the cluster worse because the only
    way to get rid of this region is to force unassign it, delete it from
    .META. and then possibly also delete it from HDFS.

    I'm wondering how this could be done better, should we do more checks
    when including that sort of region? Like, at least make sure we can
    open it? And then what? Just report it?

    Thx for reading this far,

    J-D
  • Jonathan Hsieh at Jul 19, 2012 at 12:13 pm
    We actually ran into something similar on an upgrade from hbase 0.90 to an
    hbase 0.92 -- a few regions would bounce around between regionservers
    failing after going into FAILED_OPEN rit state.

    Here were the repair cases we considered:
    1) What do you do if the parent file is not present? Sideline the
    reference files. Bulk load and data files. Without the original file we
    cannot really save anything. If the parent is not present, it may have
    been moved, but its data is still present.
    2) What do you do if the parent file is present? I think you can sideline
    the reference files. The original file is present somewhere in hdfs so
    that means the data is not lost.

    Another related idea is to have a quarantine directory for regions/files
    that are repeatedly ill-behaved. For example, if we tried to read a
    reference file multiple times and failed, quarantine the file and try
    again. We had another case -- we ran into a truncated hfile and the same
    strategy would have gotten the cluster working (and still has the
    posibility of data recovery)

    Jon.
    On Wed, Jul 18, 2012 at 9:56 PM, Ramkrishna.S.Vasudevan wrote:

    J-d
    Corrections, if META does not have an entry then we cannot know if it is
    splitted or not.. Apologies for that.

    I think we need to check for Reference files and if the opening fails we
    need to report it. That should be the way.
    But we should also confirm whether this region was split properly, right?

    Regards
    Ram
    -----Original Message-----
    From: Ramkrishna.S.Vasudevan
    Sent: Thursday, July 19, 2012 10:21 AM
    To: 'dev@hbase.apache.org'
    Subject: RE: Wondering what hbck should do in this situation

    J-D

    Just going thro the explanation I feel that the region that had
    references is a parent region and it should have an entry in META
    saying it is SPLIT and OFFLINE?

    May be while fixing those cases where we find something in HDFS and not
    in META we may need see if it is splitted?

    Was there any reason why the CatalogJanitor was not able to pick this
    region for clean up.

    I may be wrong here JD, just going thro the explanation am thinking
    this could be the scenario.

    Thanks for bringing this up, would add this to our internal testing
    also.

    Regards
    Ram

    -----Original Message-----
    From: jdcryans@gmail.com On Behalf Of Jean-
    Daniel Cryans
    Sent: Wednesday, July 18, 2012 9:23 PM
    To: dev@hbase.apache.org
    Subject: Wondering what hbck should do in this situation

    Hey devs,

    I encountered an "interesting" situation with hbck in 0.94, we had
    this region which was on HDFS that wasn't in .META. and hbck decided
    to include it back:

    ERROR: Region { meta => null, hdfs =>
    hdfs://sfor3s24:10101/hbase/url_stumble_summary/159952764, deployed =>
    } on HDFS, but not listed in META or deployed on any region server
    12/07/17 23:46:03 INFO util.HBaseFsck: Patching .META. with
    .regioninfo: {NAME =>
    'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
    '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
    159952764,}

    Then when it tried to assign the region it got bounced between region
    servers:

    Trying to reassign region...
    12/07/17 23:46:04 INFO util.HBaseFsckRepair: Region still in
    transition, waiting for it to become assigned: {NAME =>
    'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
    '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
    159952764,}
    12/07/17 23:46:05 INFO util.HBaseFsckRepair: Region still in
    transition, waiting for it to become assigned: {NAME =>
    'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
    '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
    159952764,}
    etc

    Turns out that this region only contained references (as in post- split
    references) to a region that didn't exist anymore so when the region
    was being opened it was failing on opening those referenced files:

    2012-07-18 00:00:27,454 ERROR
    org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
    open of region=url_stumble_summary,25467315:2009-12-
    28,1271922074820.159952764,
    starting to roll back the global memstore size.
    java.io.IOException: java.io.IOException:
    java.io.FileNotFoundException: File does not exist:
    /hbase/url_stumble_summary/208247386/default/2354161894779228084
    at
    org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(
    HRegion.java:550)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:46
    3)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3
    729)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3
    677)
    ...
    Caused by: java.io.IOException: java.io.FileNotFoundException: File
    does not exist:
    /hbase/url_stumble_summary/208247386/default/2354161894779228084
    at
    org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:40
    5)
    at
    org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:258)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.
    java:2918)
    ...
    Caused by: java.io.FileNotFoundException: File does not exist:
    /hbase/url_stumble_summary/208247386/default/2354161894779228084
    at
    org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java
    :1822)
    at
    org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1
    813)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem
    .java:187)
    at
    org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:102)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
    at
    org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.j
    ava:547)
    at
    org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.
    java:1252)
    at
    org.apache.hadoop.hbase.io.HalfStoreFileReader.<init>(HalfStoreFileRead
    er.java:66)
    ...


    At first it was confusing me why it was looking for another region
    until I saw the HalfStoreFileReader :)

    So this is a case where hbck made the cluster worse because the only
    way to get rid of this region is to force unassign it, delete it from
    .META. and then possibly also delete it from HDFS.

    I'm wondering how this could be done better, should we do more checks
    when including that sort of region? Like, at least make sure we can
    open it? And then what? Just report it?

    Thx for reading this far,

    J-D

    --
    // Jonathan Hsieh (shay)
    // Software Engineer, Cloudera
    // jon@cloudera.com
  • Jean-Daniel Cryans at Jul 19, 2012 at 5:50 pm

    On Wed, Jul 18, 2012 at 9:56 PM, Ramkrishna.S.Vasudevan wrote:
    J-d
    Corrections, if META does not have an entry then we cannot know if it is
    splitted or not.. Apologies for that.

    I think we need to check for Reference files and if the opening fails we
    need to report it. That should be the way.
    But we should also confirm whether this region was split properly, right?
    That's what I'm wondering about. It seems to me that hbck currently is
    overly aggressive fixing things (see also HBASE-6417 where it merged
    .META.). So should we have all the heuristics to detect problems and
    then add the corner cases after as people find them? Or should we let
    the users decide what should be fixed? It could be that we should ask
    more questions to the users. I'm thinking out loud here.

    J-D
  • Jonathan Hsieh at Jul 19, 2012 at 5:56 pm
    Jimmy and I have been adding features essentially as we've needed them,
    including some options that limit fixes to particular tables, and limit the
    kinds of fixes that are applied.

    There is a jira for making the repairs interactive -- either a hbck shell,
    an interactive mode that provides a series of y/n questions. I'd be
    amenable to any of these kinds of improvements.

    Jon.
    On Thu, Jul 19, 2012 at 10:50 AM, Jean-Daniel Cryans wrote:

    On Wed, Jul 18, 2012 at 9:56 PM, Ramkrishna.S.Vasudevan
    wrote:
    J-d
    Corrections, if META does not have an entry then we cannot know if it is
    splitted or not.. Apologies for that.

    I think we need to check for Reference files and if the opening fails we
    need to report it. That should be the way.
    But we should also confirm whether this region was split properly, right?
    That's what I'm wondering about. It seems to me that hbck currently is
    overly aggressive fixing things (see also HBASE-6417 where it merged
    .META.). So should we have all the heuristics to detect problems and
    then add the corner cases after as people find them? Or should we let
    the users decide what should be fixed? It could be that we should ask
    more questions to the users. I'm thinking out loud here.

    J-D


    --
    // Jonathan Hsieh (shay)
    // Software Engineer, Cloudera
    // jon@cloudera.com
  • Jimmy Xiang at Jul 19, 2012 at 6:02 pm
    HBASE-5324 is the one I filed on interactive hbck. We can use it if
    there is no duplicate one.

    Thanks,
    Jmmy
    On Thu, Jul 19, 2012 at 10:55 AM, Jonathan Hsieh wrote:
    Jimmy and I have been adding features essentially as we've needed them,
    including some options that limit fixes to particular tables, and limit the
    kinds of fixes that are applied.

    There is a jira for making the repairs interactive -- either a hbck shell,
    an interactive mode that provides a series of y/n questions. I'd be
    amenable to any of these kinds of improvements.

    Jon.
    On Thu, Jul 19, 2012 at 10:50 AM, Jean-Daniel Cryans wrote:

    On Wed, Jul 18, 2012 at 9:56 PM, Ramkrishna.S.Vasudevan
    wrote:
    J-d
    Corrections, if META does not have an entry then we cannot know if it is
    splitted or not.. Apologies for that.

    I think we need to check for Reference files and if the opening fails we
    need to report it. That should be the way.
    But we should also confirm whether this region was split properly, right?
    That's what I'm wondering about. It seems to me that hbck currently is
    overly aggressive fixing things (see also HBASE-6417 where it merged
    .META.). So should we have all the heuristics to detect problems and
    then add the corner cases after as people find them? Or should we let
    the users decide what should be fixed? It could be that we should ask
    more questions to the users. I'm thinking out loud here.

    J-D


    --
    // Jonathan Hsieh (shay)
    // Software Engineer, Cloudera
    // jon@cloudera.com
  • Jean-Daniel Cryans at Jul 19, 2012 at 6:05 pm
    Thanks for the jira Jimmy, it seems to me that we should aim for a dry
    run feature first and then consider the interactive part. At least it
    would give the user an opportunity to fix problems that would
    otherwise make things worse.

    J-D
    On Thu, Jul 19, 2012 at 11:02 AM, Jimmy Xiang wrote:
    HBASE-5324 is the one I filed on interactive hbck. We can use it if
    there is no duplicate one.

    Thanks,
    Jmmy
    On Thu, Jul 19, 2012 at 10:55 AM, Jonathan Hsieh wrote:
    Jimmy and I have been adding features essentially as we've needed them,
    including some options that limit fixes to particular tables, and limit the
    kinds of fixes that are applied.

    There is a jira for making the repairs interactive -- either a hbck shell,
    an interactive mode that provides a series of y/n questions. I'd be
    amenable to any of these kinds of improvements.

    Jon.
    On Thu, Jul 19, 2012 at 10:50 AM, Jean-Daniel Cryans wrote:

    On Wed, Jul 18, 2012 at 9:56 PM, Ramkrishna.S.Vasudevan
    wrote:
    J-d
    Corrections, if META does not have an entry then we cannot know if it is
    splitted or not.. Apologies for that.

    I think we need to check for Reference files and if the opening fails we
    need to report it. That should be the way.
    But we should also confirm whether this region was split properly, right?
    That's what I'm wondering about. It seems to me that hbck currently is
    overly aggressive fixing things (see also HBASE-6417 where it merged
    .META.). So should we have all the heuristics to detect problems and
    then add the corner cases after as people find them? Or should we let
    the users decide what should be fixed? It could be that we should ask
    more questions to the users. I'm thinking out loud here.

    J-D


    --
    // Jonathan Hsieh (shay)
    // Software Engineer, Cloudera
    // jon@cloudera.com
  • Lars hofhansl at Jul 19, 2012 at 7:12 pm
    +1 on a dry-run option. All that's needed might just a bit more logging on a normal "non-fix" run.

    Interactive can be very simple with just some y/n decision points.Unix' fsck could be a potential guideline.


    The issue at hand seems just like an oversight in the current implementation, though.

    -- Lars


    ----- Original Message -----
    From: Jean-Daniel Cryans <jdcryans@apache.org>
    To: dev@hbase.apache.org
    Cc:
    Sent: Thursday, July 19, 2012 11:05 AM
    Subject: Re: Wondering what hbck should do in this situation

    Thanks for the jira Jimmy, it seems to me that we should aim for a dry
    run feature first and then consider the interactive part. At least it
    would give the user an opportunity to fix problems that would
    otherwise make things worse.

    J-D
    On Thu, Jul 19, 2012 at 11:02 AM, Jimmy Xiang wrote:
    HBASE-5324 is the one I filed on interactive hbck.  We can use it if
    there is no duplicate one.

    Thanks,
    Jmmy
    On Thu, Jul 19, 2012 at 10:55 AM, Jonathan Hsieh wrote:
    Jimmy and I have been adding features essentially as we've needed them,
    including some options that limit fixes to particular tables, and limit the
    kinds of fixes that are applied.

    There is a jira for making the repairs interactive -- either a hbck shell,
    an interactive mode that provides a series of y/n questions.  I'd be
    amenable to any of these kinds of improvements.

    Jon.
    On Thu, Jul 19, 2012 at 10:50 AM, Jean-Daniel Cryans wrote:

    On Wed, Jul 18, 2012 at 9:56 PM, Ramkrishna.S.Vasudevan
    wrote:
    J-d
    Corrections, if META does not have an entry then we cannot know if it is
    splitted or not.. Apologies for that.

    I think we need to check for Reference files and if the opening fails we
    need to report it.  That should be the way.
    But we should also confirm whether this region was split properly, right?
    That's what I'm wondering about. It seems to me that hbck currently is
    overly aggressive fixing things (see also HBASE-6417 where it merged
    .META.). So should we have all the heuristics to detect problems and
    then add the corner cases after as people find them? Or should we let
    the users decide what should be fixed? It could be that we should ask
    more questions to the users. I'm thinking out loud here.

    J-D


    --
    // Jonathan Hsieh (shay)
    // Software Engineer, Cloudera
    // jon@cloudera.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJul 18, '12 at 3:53p
activeJul 19, '12 at 7:12p
posts10
users6
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase