FAQ
Hi Erdem,

First, the hbase row key must be mapped to a string column. Otherwise,
it'll do a full table scan on HBase.

Now, here's the sequence of queries:

1. Retrieve the list of user name into the local env first.
select u.name, count((*) visit
from hdfs_log
group by u.name
order by visit limit 50;

2. For each user, retrieve the details from hbase:
select *
from hbase_user
where row_key in (<the list of user name>)

Thanks,
Alan

On Tue, Mar 11, 2014 at 10:09 AM, Erdem Agaoglu wrote:

Hi all,

Directly referencing impala docs, one Hbase use case example is :

Or the HBase table could be joined with a larger Impala-managed table. For
example, analyze the large Impala table representing web traffic for a site
and pick out 50 users who view the most pages. Join that result with the
wide user table in HBase to look up attributes of those users. The HBase
side of the join would result in 50 efficient single-row lookups in HBase,
rather than scanning the entire user table.

Are there any examples doing this with a single query? I simply try
something like

select u.name
from logs l
join users u on l.user = u.id
where l.time > '12:00' and l.time < '12:01'

and end up scanning the entire HBase table. It doesn't matter if the
predicate is even like l.user = 'userid'.
It seems i am missing something very basic here.

Any ideas?

To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 5 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedMar 11, '14 at 5:09p
activeMar 12, '14 at 8:50a
posts5
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Erdem Agaoglu: 3 posts Alan Choi: 2 posts

People

Translate

site design / logo © 2022 Grokbase