FAQ
OK
55504011
Time taken: 290.216 seconds
hive> select count(1) from pageviews;

select count(1) from files f;
Ended Job = job_200909171715_20347
OK
10164516
Time taken: 29.946 seconds

select count(1) from files f join pageviews p on f.id = p.file_id

OK
89375203
Time taken: 164.767 seconds

Any hint on what is going wrong here? from our dataset each pageview
should be related to 1 or 0 files?

Thanks,
Edward

Search Discussions

  • Zheng Shao at Feb 4, 2010 at 5:41 pm
    Can you post the results of "explain" for all 3 queries?


    Zheng
    On Thu, Feb 4, 2010 at 8:41 AM, Edward Capriolo wrote:
    OK
    55504011
    Time taken: 290.216 seconds
    hive> select count(1) from pageviews;

    select count(1) from files f;
    Ended Job = job_200909171715_20347
    OK
    10164516
    Time taken: 29.946 seconds

    select count(1) from files f join pageviews p on f.id = p.file_id

    OK
    89375203
    Time taken: 164.767 seconds

    Any hint on what is going wrong here? from our dataset each pageview
    should be related to 1 or 0 files?

    Thanks,
    Edward


    --
    Yours,
    Zheng
  • Edward Capriolo at Feb 4, 2010 at 6:24 pm

    On Thu, Feb 4, 2010 at 12:41 PM, Zheng Shao wrote:
    Can you post the results of "explain" for all 3 queries?


    Zheng
    On Thu, Feb 4, 2010 at 8:41 AM, Edward Capriolo wrote:
    OK
    55504011
    Time taken: 290.216 seconds
    hive> select count(1) from pageviews;

    select count(1) from files f;
    Ended Job = job_200909171715_20347
    OK
    10164516
    Time taken: 29.946 seconds

    select count(1) from files f join pageviews p on f.id = p.file_id

    OK
    89375203
    Time taken: 164.767 seconds

    Any hint on what is going wrong here? from our dataset each pageview
    should be related to 1 or 0 files?

    Thanks,
    Edward


    --
    Yours,
    Zheng
    Zheng,

    My mistake. I made some incorrect assumptions about my source data. We
    should add referential integrity to prevent me from making this
    mistake again. NOT!

    Thanks again,
    Edward

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedFeb 4, '10 at 4:41p
activeFeb 4, '10 at 6:24p
posts3
users2
websitehive.apache.org

2 users in discussion

Edward Capriolo: 2 posts Zheng Shao: 1 post

People

Translate

site design / logo © 2021 Grokbase