FAQ
Hi Erdem,

Your right. In order to it efficiently in a single join, Impala will have
to do an "indexed nested loop join", but our current join strategy is
limited to hash join at this moment. This is an area of improvement that
we're aware of.

As for my example, it turns out that IN clause won't be translated into
HBase row key scan parameter. It'll also be a full table scan. Only =,
<(=), >(=) will be translated row key look up or range scan. We can
definitely improve it. I've filed
IMPALA-874<https://issues.cloudera.org/browse/IMPALA-874>to track it.


Now, for the lease exception, have you tried the solution described here:
http://hbase.apache.org/book/trouble.client.html

Thanks,
Alan

On Tue, Mar 11, 2014 at 1:26 PM, Erdem Agaoglu wrote:

Hi Alan,

Thanks for the fast response. From what i understand, it's not possible to
do it in a single query using standard join. So it won't be easy to, for
example, group those users based on their attributes. This may be proposed
as an improvement.

On the other hand, this method you proposed do not seem to work either.
Relevant part of the plan for a single select query like "select * from
hbase_user where row_key = 'a'" is :
0:SCAN HBASE
table:hbase_user
start key: a
stop key: a\0
which is correct, and executes fast.

But the similar "select * from hbase_user where row_key in ('a')" plans as:
0:SCAN HBASE
table:hbase_user
predicates: row_key IN ('a')
which doesn't seem correct, and does not execute as fast as it should. And
for tables with some data in it, query fails with
"org.apache.hadoop.hbase.regionserver.LeaseException: lease 'xxx' does not
exist". My guess is, this is also a full table scan, only with some filters
applied. But i'm not sure.

Cheers,

On Tue, Mar 11, 2014 at 9:01 PM, Alan Choi wrote:

Hi Erdem,

First, the hbase row key must be mapped to a string column. Otherwise,
it'll do a full table scan on HBase.

Now, here's the sequence of queries:

1. Retrieve the list of user name into the local env first.
select u.name, count((*) visit
from hdfs_log
group by u.name
order by visit limit 50;

2. For each user, retrieve the details from hbase:
select *
from hbase_user
where row_key in (<the list of user name>)

Thanks,
Alan


On Tue, Mar 11, 2014 at 10:09 AM, Erdem Agaoglu <erdem.agaoglu@gmail.com
wrote:
Hi all,

Directly referencing impala docs, one Hbase use case example is :

Or the HBase table could be joined with a larger Impala-managed table.
For example, analyze the large Impala table representing web traffic for a
site and pick out 50 users who view the most pages. Join that result with
the wide user table in HBase to look up attributes of those users. The
HBase side of the join would result in 50 efficient single-row lookups in
HBase, rather than scanning the entire user table.

Are there any examples doing this with a single query? I simply try
something like

select u.name
from logs l
join users u on l.user = u.id
where l.time > '12:00' and l.time < '12:01'

and end up scanning the entire HBase table. It doesn't matter if the
predicate is even like l.user = 'userid'.
It seems i am missing something very basic here.

Any ideas?

To unsubscribe from this group and stop receiving emails from it, send
an email to impala-user+unsubscribe@cloudera.org.

--
erdem agaoglu

To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 5 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedMar 11, '14 at 5:09p
activeMar 12, '14 at 8:50a
posts5
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Erdem Agaoglu: 3 posts Alan Choi: 2 posts

People

Translate

site design / logo © 2022 Grokbase