FAQ
Background see
https://groups.google.com/a/cloudera.org/forum/?fromgroups#!topic/impala-user/HTuNMUwnB0w

And I found below code in HBaseScanNode.java line215

         if (prevEndKey != null &&
             Bytes.compareTo(prevEndKey, curRegStartKey) == 0) {
           // the current region starts where the previous one left off;
           // extend the key range
           setKeyRangeEnd(keyRange, curRegEndKey);
         } else {
           // create a new HBaseKeyRange (and
TScanRange2/TScanRangeLocations to go
           // with it).
           keyRange = new THBaseKeyRange();
           setKeyRangeStart(keyRange, curRegStartKey);
           setKeyRangeEnd(keyRange, curRegEndKey);
           TScanRangeLocations scanRangeLocation = new TScanRangeLocations();
           scanRangeLocation.addToLocations(
               new
TScanRangeLocation(addressToTNetworkAddress(locEntry.getKey())));
           result.add(scanRangeLocation);
           TScanRange scanRange = new TScanRange();
           scanRange.setHbase_key_range(keyRange);
           scanRangeLocation.setScan_range(scanRange);
         }

*As a result, if there are more than 1 regions from the same RegionServer
locating at the end of the region list, the last one will be ignored.*

e.g.


*Name***

*Region Server***

*Start Key***

*End Key***

*Requests***

testsplit,,1366291520731.b4c5d5b8853dd959b241809a6e588027.

*vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

key20156



testsplit,key20156,1366291520731.111a77775ae2c53b71e443282ec29396.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

key20156

key307



testsplit,key307,1366291521233.1bf85e44b36a529c500a65854eb8ca95.

*vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

key307

key41240



testsplit,key41240,1366291521233.4332bac0b62215de6ea0865ee5cdacf1.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

key41240

key51786



testsplit,key51786,1366291521611.301f41e3b00a4a3d8b90e7384355218b.

*vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

key51786

key63671



testsplit,key63671,1366291521611.02e4ecc9523a31975b1bbec35cd5ebe4.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

key63671

key75747



testsplit,key75747,1366291521967.c5b322b262751a084144e1ee90a16978.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

key75747



*The last region(StartKey:key75747) will not be store in the keyRange. So
when do scanning, this region will be skipped.*
**

Search Discussions

  • Jung-Yup Lee at Apr 23, 2013 at 5:21 am

    And I found below code in HBaseScanNode.java line215

    if (prevEndKey != null &&
    Bytes.compareTo(prevEndKey, curRegStartKey) == 0) {
    At the last iteration of for loop, prevEndKey is equal to 'key75747' and
    curRegStartKey is equal to 'key75747'.
    So compareTo function will return 0, and then curRegEndKey which indicates
    the end key of last region will be set properly(HConstants.EMPTY_END_ROW).
    I didn't debug with jdb, so I am not 100% sure.

    // the current region starts where the previous one left off;
    // extend the key range
    setKeyRangeEnd(keyRange, curRegEndKey);
    } else {
    // create a new HBaseKeyRange (and
    TScanRange2/TScanRangeLocations to go
    // with it).
    keyRange = new THBaseKeyRange();
    setKeyRangeStart(keyRange, curRegStartKey);
    setKeyRangeEnd(keyRange, curRegEndKey);
    TScanRangeLocations scanRangeLocation = new
    TScanRangeLocations();
    scanRangeLocation.addToLocations(
    new
    TScanRangeLocation(addressToTNetworkAddress(locEntry.getKey())));
    result.add(scanRangeLocation);
    TScanRange scanRange = new TScanRange();
    scanRange.setHbase_key_range(keyRange);
    scanRangeLocation.setScan_range(scanRange);
    }

    *As a result, if there are more than 1 regions from the same RegionServer
    locating at the end of the region list, the last one will be ignored.*

    e.g.


    *Name***

    *Region Server***

    *Start Key***

    *End Key***

    *Requests***

    testsplit,,1366291520731.b4c5d5b8853dd959b241809a6e588027.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key20156



    testsplit,key20156,1366291520731.111a77775ae2c53b71e443282ec29396.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key20156

    key307



    testsplit,key307,1366291521233.1bf85e44b36a529c500a65854eb8ca95.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key307

    key41240



    testsplit,key41240,1366291521233.4332bac0b62215de6ea0865ee5cdacf1.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key41240

    key51786



    testsplit,key51786,1366291521611.301f41e3b00a4a3d8b90e7384355218b.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key51786

    key63671



    testsplit,key63671,1366291521611.02e4ecc9523a31975b1bbec35cd5ebe4.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key63671

    key75747



    testsplit,key75747,1366291521967.c5b322b262751a084144e1ee90a16978.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key75747



    *The last region(StartKey:key75747) will not be store in the keyRange. So
    when do scanning, this region will be skipped.*
    **
  • Kane Xiej at Apr 23, 2013 at 7:03 am
    At the last iteration of for loop, setKeyRangeEnd(keyRange, curRegEndKey)
    will do nothing. See line297.

    在 2013年4月23日星期二UTC+8下午1时21分04秒,Jung-Yup Lee写道:



    And I found below code in HBaseScanNode.java line215

    if (prevEndKey != null &&
    Bytes.compareTo(prevEndKey, curRegStartKey) == 0) {
    At the last iteration of for loop, prevEndKey is equal to 'key75747' and
    curRegStartKey is equal to 'key75747'.
    So compareTo function will return 0, and then curRegEndKey which indicates
    the end key of last region will be set properly(HConstants.EMPTY_END_ROW).
    I didn't debug with jdb, so I am not 100% sure.

    // the current region starts where the previous one left off;
    // extend the key range
    setKeyRangeEnd(keyRange, curRegEndKey);
    } else {
    // create a new HBaseKeyRange (and
    TScanRange2/TScanRangeLocations to go
    // with it).
    keyRange = new THBaseKeyRange();
    setKeyRangeStart(keyRange, curRegStartKey);
    setKeyRangeEnd(keyRange, curRegEndKey);
    TScanRangeLocations scanRangeLocation = new
    TScanRangeLocations();
    scanRangeLocation.addToLocations(
    new
    TScanRangeLocation(addressToTNetworkAddress(locEntry.getKey())));
    result.add(scanRangeLocation);
    TScanRange scanRange = new TScanRange();
    scanRange.setHbase_key_range(keyRange);
    scanRangeLocation.setScan_range(scanRange);
    }

    *As a result, if there are more than 1 regions from the same
    RegionServer locating at the end of the region list, the last one will be
    ignored.*

    e.g.


    *Name***

    *Region Server***

    *Start Key***

    *End Key***

    *Requests***

    testsplit,,1366291520731.b4c5d5b8853dd959b241809a6e588027.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key20156



    testsplit,key20156,1366291520731.111a77775ae2c53b71e443282ec29396.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key20156

    key307



    testsplit,key307,1366291521233.1bf85e44b36a529c500a65854eb8ca95.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key307

    key41240



    testsplit,key41240,1366291521233.4332bac0b62215de6ea0865ee5cdacf1.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key41240

    key51786



    testsplit,key51786,1366291521611.301f41e3b00a4a3d8b90e7384355218b.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key51786

    key63671



    testsplit,key63671,1366291521611.02e4ecc9523a31975b1bbec35cd5ebe4.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key63671

    key75747



    testsplit,key75747,1366291521967.c5b322b262751a084144e1ee90a16978.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key75747



    *The last region(StartKey:key75747) will not be store in the keyRange.
    So when do scanning, this region will be skipped.*
    **
  • Jung-Yup Lee at Apr 23, 2013 at 9:08 am
    You're right. I think It may be a bug.
    On Tuesday, April 23, 2013 4:03:28 PM UTC+9, kane...@gmail.com wrote:

    At the last iteration of for loop, setKeyRangeEnd(keyRange, curRegEndKey)
    will do nothing. See line297.

    在 2013年4月23日星期二UTC+8下午1时21分04秒,Jung-Yup Lee写道:



    And I found below code in HBaseScanNode.java line215

    if (prevEndKey != null &&
    Bytes.compareTo(prevEndKey, curRegStartKey) == 0) {
    At the last iteration of for loop, prevEndKey is equal to 'key75747' and
    curRegStartKey is equal to 'key75747'.
    So compareTo function will return 0, and then curRegEndKey which
    indicates the end key of last region will be set
    properly(HConstants.EMPTY_END_ROW).
    I didn't debug with jdb, so I am not 100% sure.

    // the current region starts where the previous one left off;
    // extend the key range
    setKeyRangeEnd(keyRange, curRegEndKey);
    } else {
    // create a new HBaseKeyRange (and
    TScanRange2/TScanRangeLocations to go
    // with it).
    keyRange = new THBaseKeyRange();
    setKeyRangeStart(keyRange, curRegStartKey);
    setKeyRangeEnd(keyRange, curRegEndKey);
    TScanRangeLocations scanRangeLocation = new
    TScanRangeLocations();
    scanRangeLocation.addToLocations(
    new
    TScanRangeLocation(addressToTNetworkAddress(locEntry.getKey())));
    result.add(scanRangeLocation);
    TScanRange scanRange = new TScanRange();
    scanRange.setHbase_key_range(keyRange);
    scanRangeLocation.setScan_range(scanRange);
    }

    *As a result, if there are more than 1 regions from the same
    RegionServer locating at the end of the region list, the last one will be
    ignored.*

    e.g.


    *Name***

    *Region Server***

    *Start Key***

    *End Key***

    *Requests***

    testsplit,,1366291520731.b4c5d5b8853dd959b241809a6e588027.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key20156



    testsplit,key20156,1366291520731.111a77775ae2c53b71e443282ec29396.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key20156

    key307



    testsplit,key307,1366291521233.1bf85e44b36a529c500a65854eb8ca95.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key307

    key41240



    testsplit,key41240,1366291521233.4332bac0b62215de6ea0865ee5cdacf1.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key41240

    key51786



    testsplit,key51786,1366291521611.301f41e3b00a4a3d8b90e7384355218b.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key51786

    key63671



    testsplit,key63671,1366291521611.02e4ecc9523a31975b1bbec35cd5ebe4.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key63671

    key75747



    testsplit,key75747,1366291521967.c5b322b262751a084144e1ee90a16978.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key75747



    *The last region(StartKey:key75747) will not be store in the keyRange.
    So when do scanning, this region will be skipped.*
    **
  • Alan at May 16, 2013 at 5:44 pm
    Thanks for the report. I've filed IMPALA-356 to track it.
    On Tuesday, April 23, 2013 2:08:22 AM UTC-7, Jung-Yup Lee wrote:

    You're right. I think It may be a bug.
    On Tuesday, April 23, 2013 4:03:28 PM UTC+9, kane...@gmail.com wrote:

    At the last iteration of for loop, setKeyRangeEnd(keyRange, curRegEndKey)
    will do nothing. See line297.

    在 2013年4月23日星期二UTC+8下午1时21分04秒,Jung-Yup Lee写道:



    And I found below code in HBaseScanNode.java line215

    if (prevEndKey != null &&
    Bytes.compareTo(prevEndKey, curRegStartKey) == 0) {
    At the last iteration of for loop, prevEndKey is equal to 'key75747' and
    curRegStartKey is equal to 'key75747'.
    So compareTo function will return 0, and then curRegEndKey which
    indicates the end key of last region will be set
    properly(HConstants.EMPTY_END_ROW).
    I didn't debug with jdb, so I am not 100% sure.

    // the current region starts where the previous one left off;
    // extend the key range
    setKeyRangeEnd(keyRange, curRegEndKey);
    } else {
    // create a new HBaseKeyRange (and
    TScanRange2/TScanRangeLocations to go
    // with it).
    keyRange = new THBaseKeyRange();
    setKeyRangeStart(keyRange, curRegStartKey);
    setKeyRangeEnd(keyRange, curRegEndKey);
    TScanRangeLocations scanRangeLocation = new
    TScanRangeLocations();
    scanRangeLocation.addToLocations(
    new
    TScanRangeLocation(addressToTNetworkAddress(locEntry.getKey())));
    result.add(scanRangeLocation);
    TScanRange scanRange = new TScanRange();
    scanRange.setHbase_key_range(keyRange);
    scanRangeLocation.setScan_range(scanRange);
    }

    *As a result, if there are more than 1 regions from the same
    RegionServer locating at the end of the region list, the last one will be
    ignored.*

    e.g.


    *Name***

    *Region Server***

    *Start Key***

    *End Key***

    *Requests***

    testsplit,,1366291520731.b4c5d5b8853dd959b241809a6e588027.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key20156



    testsplit,key20156,1366291520731.111a77775ae2c53b71e443282ec29396.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key20156

    key307



    testsplit,key307,1366291521233.1bf85e44b36a529c500a65854eb8ca95.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key307

    key41240



    testsplit,key41240,1366291521233.4332bac0b62215de6ea0865ee5cdacf1.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key41240

    key51786



    testsplit,key51786,1366291521611.301f41e3b00a4a3d8b90e7384355218b.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key51786

    key63671



    testsplit,key63671,1366291521611.02e4ecc9523a31975b1bbec35cd5ebe4.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key63671

    key75747



    testsplit,key75747,1366291521967.c5b322b262751a084144e1ee90a16978.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key75747



    *The last region(StartKey:key75747) will not be store in the keyRange.
    So when do scanning, this region will be skipped.*
    **
  • Jung-Yup Lee at May 16, 2013 at 10:31 pm
    I attached a patch to fix this bug before in IMPALA-300<https://issues.cloudera.org/browse/IMPALA-300>
    .
    The patch includes what Alan points out in IMPALA-356<https://issues.cloudera.org/browse/IMPALA-356>
    .

    Thanks
    On Friday, May 17, 2013 2:44:27 AM UTC+9, Alan wrote:

    Thanks for the report. I've filed IMPALA-356 to track it.
    On Tuesday, April 23, 2013 2:08:22 AM UTC-7, Jung-Yup Lee wrote:

    You're right. I think It may be a bug.
    On Tuesday, April 23, 2013 4:03:28 PM UTC+9, kane...@gmail.com wrote:

    At the last iteration of for loop, setKeyRangeEnd(keyRange,
    curRegEndKey) will do nothing. See line297.

    在 2013年4月23日星期二UTC+8下午1时21分04秒,Jung-Yup Lee写道:



    And I found below code in HBaseScanNode.java line215

    if (prevEndKey != null &&
    Bytes.compareTo(prevEndKey, curRegStartKey) == 0) {
    At the last iteration of for loop, prevEndKey is equal to 'key75747'
    and curRegStartKey is equal to 'key75747'.
    So compareTo function will return 0, and then curRegEndKey which
    indicates the end key of last region will be set
    properly(HConstants.EMPTY_END_ROW).
    I didn't debug with jdb, so I am not 100% sure.

    // the current region starts where the previous one left off;
    // extend the key range
    setKeyRangeEnd(keyRange, curRegEndKey);
    } else {
    // create a new HBaseKeyRange (and
    TScanRange2/TScanRangeLocations to go
    // with it).
    keyRange = new THBaseKeyRange();
    setKeyRangeStart(keyRange, curRegStartKey);
    setKeyRangeEnd(keyRange, curRegEndKey);
    TScanRangeLocations scanRangeLocation = new
    TScanRangeLocations();
    scanRangeLocation.addToLocations(
    new
    TScanRangeLocation(addressToTNetworkAddress(locEntry.getKey())));
    result.add(scanRangeLocation);
    TScanRange scanRange = new TScanRange();
    scanRange.setHbase_key_range(keyRange);
    scanRangeLocation.setScan_range(scanRange);
    }

    *As a result, if there are more than 1 regions from the same
    RegionServer locating at the end of the region list, the last one will be
    ignored.*

    e.g.


    *Name***

    *Region Server***

    *Start Key***

    *End Key***

    *Requests***

    testsplit,,1366291520731.b4c5d5b8853dd959b241809a6e588027.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key20156



    testsplit,key20156,1366291520731.111a77775ae2c53b71e443282ec29396.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key20156

    key307



    testsplit,key307,1366291521233.1bf85e44b36a529c500a65854eb8ca95.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key307

    key41240



    testsplit,key41240,1366291521233.4332bac0b62215de6ea0865ee5cdacf1.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key41240

    key51786



    testsplit,key51786,1366291521611.301f41e3b00a4a3d8b90e7384355218b.

    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

    key51786

    key63671



    testsplit,key63671,1366291521611.02e4ecc9523a31975b1bbec35cd5ebe4.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key63671

    key75747



    testsplit,key75747,1366291521967.c5b322b262751a084144e1ee90a16978.

    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

    key75747



    *The last region(StartKey:key75747) will not be store in the
    keyRange. So when do scanning, this region will be skipped.*
    **

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedApr 22, '13 at 11:11a
activeMay 16, '13 at 10:31p
posts6
users3
websitecloudera.com
irc#hadoop

3 users in discussion

Jung-Yup Lee: 3 posts Kane Xiej: 2 posts Alan: 1 post

People

Translate

site design / logo © 2022 Grokbase