FAQ
I am not quite sure, but getting a region location from client cache by
calling getRegionLocation may occur problems like your case.

<fe/src/main/java/com/cloudera/impala/planner/HBaseScanNode.java>

290 private List<HRegionLocation> getRegionsInRange(HTable hbaseTbl,
291 final byte[] startKey, final byte[] endKey) throws IOException {
292 boolean endKeyIsEndOfTable =
293 Bytes.equals(endKey, HConstants.EMPTY_END_ROW);
294 if ((Bytes.compareTo(startKey, endKey) > 0) &&
295 (endKeyIsEndOfTable == false)) {
296 throw new IllegalArgumentException(
297 "Invalid range: " + Bytes.toStringBinary(startKey) +
298 " > " + Bytes.toStringBinary(endKey));
299 }
300 List<HRegionLocation> regionList = new ArrayList<HRegionLocation>();
301 byte [] currentKey = startKey;
302 do {
303 HRegionLocation regionLocation =
hbaseTbl.getRegionLocation(currentKey); <== I think
hbaseTbl.getRegionLocation(currentKey, *true*); is correct
304 regionList.add(regionLocation);
305 currentKey = regionLocation.getRegionInfo().getEndKey();
306 } while (!Bytes.equals(currentKey, HConstants.EMPTY_END_ROW) &&
307 (endKeyIsEndOfTable == true ||
308 Bytes.compareTo(currentKey, endKey) < 0));
309 return regionList;
310 }



On Monday, April 22, 2013 2:59:41 PM UTC+9, kane...@gmail.com wrote:

Cluster Information:
Total of 2 Nodes in the cluster - with CDH42 installed by RPM and impala
beta .6

At first the Hbase, Hive and Impala all work well, but after the
HRegion(s) changing(region split or move it to another region server),
Impala return wrong results, on the other hand, the Hbase and Hive still
work right.



For example,



A. 4 regions distribute in 2 RServers.



*Name***

*Region Server***

*Start Key***

*End Key***

*Requests***

testhive,,1365649095895.35878eef07f63754bc6e08314b4d6559.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

0.3334863476115134

2

testhive,0.3334863476115134,1365649095895.b938a2f65d4947d017bb246e63daf8f5.

*vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

0.3334863476115134

key0

12

testhive,key0,1365649096332.8a6ba2a39c67ce00fb41b5f760dc86e4.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

key0

key90

12

testhive,key90,1365649096332.b69882f6e5337f41036aba0f5a8f9347.

*vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>

key90

12





hbase(main):001:0> count 'testhive'

30 row(s) in 0.1650 seconds



hive> select * from testhive; (These servers haven’t installed MR, so just
can execute select *)

OK

(list 30 rows)



[vm-9e88-17b6.nam.nsroot.net:21000] > select count(*) from testhive;

Query: select count(*) from testhive

Query finished, fetching results ...

30

Returned 1 row(s) in 2.68s





B. Shutdown Region Server on vm-9e88-17b6, and all region will be moved
to vm-0660-ba06.



*Name***

*Region Server***

*Start Key***

*End Key***

*Requests***

testhive,,1365649095895.35878eef07f63754bc6e08314b4d6559.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

0.3334863476115134

6

testhive,0.3334863476115134,1365649095895.b938a2f65d4947d017bb246e63daf8f5.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

0.3334863476115134

key0



testhive,key0,1365649096332.8a6ba2a39c67ce00fb41b5f760dc86e4.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

key0

key90

35

testhive,key90,1365649096332.b69882f6e5337f41036aba0f5a8f9347.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

key90







hbase(main):001:0> count 'testhive'

30 row(s) in 0.0970 seconds



hive> select * from testhive;

OK

(list 30 rows)



[vm-9e88-17b6.nam.nsroot.net:21000] > select count(*) from testhive;

Query: select count(*) from testhive

Query finished, fetching results ...

20

Returned 1 row(s) in 1.24s





And I found that when executing Impala query, the requests of the first 3
regions increased, but the last region didn’t get any requests.



Before querying:

*Name***

*Region Server***

*Start Key***

*End Key***

*Requests***

testhive,,1365649095895.35878eef07f63754bc6e08314b4d6559.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

0.3334863476115134

24

testhive,0.3334863476115134,1365649095895.b938a2f65d4947d017bb246e63daf8f5.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

0.3334863476115134

key0

101

testhive,key0,1365649096332.8a6ba2a39c67ce00fb41b5f760dc86e4.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

key0

key90

136

testhive,key90,1365649096332.b69882f6e5337f41036aba0f5a8f9347.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

key90

57



After querying:



*Name***

*Region Server***

*Start Key***

*End Key***

*Requests***

testhive,,1365649095895.35878eef07f63754bc6e08314b4d6559.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

0.3334863476115134

26

testhive,0.3334863476115134,1365649095895.b938a2f65d4947d017bb246e63daf8f5.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

0.3334863476115134

key0

112

testhive,key0,1365649096332.8a6ba2a39c67ce00fb41b5f760dc86e4.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

key0

key90

147

testhive,key90,1365649096332.b69882f6e5337f41036aba0f5a8f9347.

*vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>

key90

57



If query by Hbase or Hive, all regions are able to get requests.





If a region split to 2 ones, sometimes the new regions can’t get requests
too, but not always.



Anyone ever encountered the same issue?



Please help me if you have any ideas. Thanks.

Search Discussions

  • 静 谢 at Apr 23, 2013 at 1:44 am
    Hi Lee,

    Though getRegionLocation(currentKey) is deprecated, it still work as
    my test.

    The cause maybe

    https://groups.google.com/a/cloudera.org/forum/?fromgroups#!topic/impala-user/r5AgsHu_9j4
    On 4月23日, 上午5时14分, Jung-Yup Lee wrote:
    I am not quite sure, but getting a region location from client cache by
    calling getRegionLocation may occur problems like your case.

    <fe/src/main/java/com/cloudera/impala/planner/HBaseScanNode.java>

    290   private List<HRegionLocation> getRegionsInRange(HTable hbaseTbl,
    291       final byte[] startKey, final byte[] endKey) throws IOException {
    292     boolean endKeyIsEndOfTable =
    293         Bytes.equals(endKey, HConstants.EMPTY_END_ROW);
    294     if ((Bytes.compareTo(startKey, endKey) > 0) &&
    295         (endKeyIsEndOfTable == false)) {
    296       throw new IllegalArgumentException(
    297         "Invalid range: " + Bytes.toStringBinary(startKey) +
    298         " > " + Bytes.toStringBinary(endKey));
    299     }
    300     List<HRegionLocation> regionList = new ArrayList<HRegionLocation>();
    301     byte [] currentKey = startKey;
    302     do {
    303       HRegionLocation regionLocation =
    hbaseTbl.getRegionLocation(currentKey);  <== I think
    hbaseTbl.getRegionLocation(currentKey, *true*); is correct
    304       regionList.add(regionLocation);
    305       currentKey = regionLocation.getRegionInfo().getEndKey();
    306     } while (!Bytes.equals(currentKey, HConstants.EMPTY_END_ROW) &&
    307              (endKeyIsEndOfTable == true ||
    308               Bytes.compareTo(currentKey, endKey) < 0));
    309     return regionList;
    310   }


    On Monday, April 22, 2013 2:59:41 PM UTC+9, kane...@gmail.com wrote:

    Cluster Information:
    Total of 2 Nodes in the cluster - with CDH42 installed by RPM  and impala
    beta .6
    At first the Hbase, Hive and Impala all work well, but after the
    HRegion(s) changing(region split or move it to another region server),
    Impala return wrong results, on the other hand, the Hbase and Hive still
    work right.
    For example,
    A.  4 regions distribute in 2 RServers.
    *Name***
    *Region Server***
    *Start Key***
    *End Key***
    *Requests***
    testhive,,1365649095895.35878eef07f63754bc6e08314b4d6559.
    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>
    0.3334863476115134
    2
    testhive,0.3334863476115134,1365649095895.b938a2f65d4947d017bb246e63daf8f5.
    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>
    0.3334863476115134
    key0
    12
    testhive,key0,1365649096332.8a6ba2a39c67ce00fb41b5f760dc86e4.
    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>
    key0
    key90
    12
    testhive,key90,1365649096332.b69882f6e5337f41036aba0f5a8f9347.
    *vm-9e88-17b6.nam.nsroot.net:60030*<http://vm-9e88-17b6.nam.nsroot.net:60030/>
    key90
    12
    hbase(main):001:0> count 'testhive'
    30 row(s) in 0.1650 seconds
    hive> select * from testhive; (These servers haven’t installed MR, so just
    can execute select *)
    OK
    (list 30 rows)
    [vm-9e88-17b6.nam.nsroot.net:21000] > select count(*) from testhive;
    Query: select count(*) from testhive
    Query finished, fetching results ...
    30
    Returned 1 row(s) in 2.68s
    B.  Shutdown Region Server on vm-9e88-17b6, and all region will be moved
    to vm-0660-ba06.
    *Name***
    *Region Server***
    *Start Key***
    *End Key***
    *Requests***
    testhive,,1365649095895.35878eef07f63754bc6e08314b4d6559.
    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>
    0.3334863476115134
    6
    testhive,0.3334863476115134,1365649095895.b938a2f65d4947d017bb246e63daf8f5.
    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>
    0.3334863476115134
    key0
    0
    testhive,key0,1365649096332.8a6ba2a39c67ce00fb41b5f760dc86e4.
    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>
    key0
    key90
    35
    testhive,key90,1365649096332.b69882f6e5337f41036aba0f5a8f9347.
    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>
    key90
    0
    hbase(main):001:0> count 'testhive'
    30 row(s) in 0.0970 seconds
    hive> select * from testhive;
    OK
    (list 30 rows)
    [vm-9e88-17b6.nam.nsroot.net:21000] > select count(*) from testhive;
    Query: select count(*) from testhive
    Query finished, fetching results ...
    20
    Returned 1 row(s) in 1.24s
    And I found that when executing Impala query, the requests of the first 3
    regions increased, but the last region didn’t get any requests.
    Before querying:
    *Name***
    *Region Server***
    *Start Key***
    *End Key***
    *Requests***
    testhive,,1365649095895.35878eef07f63754bc6e08314b4d6559.
    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>
    0.3334863476115134
    24
    testhive,0.3334863476115134,1365649095895.b938a2f65d4947d017bb246e63daf8f5.
    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>
    0.3334863476115134
    key0
    101
    testhive,key0,1365649096332.8a6ba2a39c67ce00fb41b5f760dc86e4.
    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>
    key0
    key90
    136
    testhive,key90,1365649096332.b69882f6e5337f41036aba0f5a8f9347.
    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>
    key90
    57
    After querying:
    *Name***
    *Region Server***
    *Start Key***
    *End Key***
    *Requests***
    testhive,,1365649095895.35878eef07f63754bc6e08314b4d6559.
    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>
    0.3334863476115134
    26
    testhive,0.3334863476115134,1365649095895.b938a2f65d4947d017bb246e63daf8f5.
    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>
    0.3334863476115134
    key0
    112
    testhive,key0,1365649096332.8a6ba2a39c67ce00fb41b5f760dc86e4.
    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>
    key0
    key90
    147
    testhive,key90,1365649096332.b69882f6e5337f41036aba0f5a8f9347.
    *vm-0660-ba06.nam.nsroot.net:60030*<http://vm-0660-ba06.nam.nsroot.net:60030/>
    key90
    57
    If query by Hbase or Hive, all regions are able to get requests.
    If a region split to 2 ones, sometimes the new regions can’t get requests
    too, but not always.
    Anyone ever encountered the same issue?
    Please help me if you have any ideas. Thanks.- 隐藏被引用文字 -
    - 显示引用的文字 -

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedApr 22, '13 at 9:14p
activeApr 23, '13 at 1:44a
posts2
users2
websitecloudera.com
irc#hadoop

2 users in discussion

静 谢: 1 post Jung-Yup Lee: 1 post

People

Translate

site design / logo © 2022 Grokbase