FAQ
Hi Marcel,

Thanks, that was indeed the issue. I was expecting seqid to be unique
across table, but it was unique only within the day partition(the table was
partitioned as year/month/day).


Thanks and Regards,
Suresh
On Thursday, 24 January 2013 21:06:22 UTC+5:30, Marcel Kornacker wrote:

Hi Suresh,

if you have duplicates in your {demographics,useractivity}.seqid, you
would end up with a join result that is larger than either one of the
tables itself. (If there were only one seqid value in all rows in both
tables, your result would have |demographics| x |useractivity| rows.)

To verify, run "select count(distinct seqid) from demographics" and
the same for the useractivity table. Keep in mind that aggregation is
done in an in-memory hash table, for which you need enough memory
available.

Marcel
On Thu, Jan 24, 2013 at 4:50 AM, Suresh K wrote:
Hi,

Currently i am trying to join 2 tables. but see unexpected results from
join. Below are the results.
========================================================
[localhost:21000] > select count(*) from demographics;
6132409
[localhost:21000] > select count(*) from useractivity;
33282073
[localhost:21000] > select count(*) from demographics join useractivity on
(demographics.seqid = useractivity.seqid);
74322301
[localhost:21000] >
=========================================================

Is the result correct? since i was making a inner join i expected count(*)
in the join to be max of first 2 queries(i.e 33282073) , but instead getting
this high number(74322301). Is my understanding correct?

Unfortunately there is some problem running hive query on the setup and
hence couldn't cross verify with Hive.

any pointers to solve this would be helpful.

Thanks and Regards
Suresh

--

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedJan 24, '13 at 12:50p
activeJan 31, '13 at 11:45a
posts2
users1
websitecloudera.com
irc#hadoop

1 user in discussion

Suresh K: 2 posts

People

Translate

site design / logo © 2023 Grokbase