FAQ
Hi all,

I read the tutorial of Hive, and it says that "no two aggregations can have
different DISTINCT columns". Could anyone tell what is the reason ? Does the
following Distinct will been translate to map-reduce job or just do it
locally ?

INSERT OVERWRITE TABLE pv_gender_agg
SELECT pv_users.gender, count(DISTINCT pv_users.userid),
count(DISTINCT pv_users.ip)
FROM pv_users
GROUP BY pv_users.gender;


--
Best Regards

Jeff Zhang

Search Discussions

  • Zheng Shao at Feb 25, 2010 at 9:08 am
    This will get a compilation error.
    The reason is that we use the sort phase in reducers to make sure we
    can detect duplicate values.
    We can only sort the table in one way than the other.

    See https://issues.apache.org/jira/browse/HIVE-537 and
    https://issues.apache.org/jira/browse/HIVE-474 for details.

    Zheng
    On Thu, Feb 25, 2010 at 1:01 AM, Jeff Zhang wrote:

    Hi all,

    I read the tutorial of Hive, and it says that "no two aggregations can have
    different DISTINCT columns". Could anyone tell what is the reason ? Does the
    following Distinct will been translate to map-reduce job or just do it
    locally ?

    INSERT OVERWRITE TABLE pv_gender_agg
    SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT
    pv_users.ip)
    FROM pv_users
    GROUP BY pv_users.gender;

    --
    Best Regards

    Jeff Zhang


    --
    Yours,
    Zheng
  • Mafish Liu at Feb 25, 2010 at 9:17 am
    Hive does not support multi-distinct in one query.

    We have implemented multi-distinct based on hive 0.4.2rc to our demand.
    We don't know that if Hive is intresting in this feature.

    2010/2/25 Jeff Zhang <zjffdu@gmail.com>:
    Hi all,

    I read the tutorial of Hive, and it says that "no two aggregations can have
    different DISTINCT columns". Could anyone tell what is the reason ? Does the
    following Distinct will been translate to map-reduce job or just do it
    locally ?

    INSERT OVERWRITE TABLE pv_gender_agg
    SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT
    pv_users.ip)
    FROM pv_users
    GROUP BY pv_users.gender;

    --
    Best Regards

    Jeff Zhang


    --
    Mafish@gmail.com
  • Zheng Shao at Feb 25, 2010 at 9:20 am
    Yes definitely. Do you want to open a JIRA and post a patch?
    Please link the new JIRA to the other 2 JIRA that was mentioned in the
    same email thread.

    Zheng
    On Thu, Feb 25, 2010 at 1:16 AM, Mafish Liu wrote:
    Hive does not support multi-distinct in one query.

    We have implemented multi-distinct based on hive 0.4.2rc to our demand.
    We don't know that if Hive is intresting in this feature.

    2010/2/25 Jeff Zhang <zjffdu@gmail.com>:
    Hi all,

    I read the tutorial of Hive, and it says that "no two aggregations can have
    different DISTINCT columns". Could anyone tell what is the reason ? Does the
    following Distinct will been translate to map-reduce job or just do it
    locally ?

    INSERT OVERWRITE TABLE pv_gender_agg
    SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT
    pv_users.ip)
    FROM pv_users
    GROUP BY pv_users.gender;

    --
    Best Regards

    Jeff Zhang


    --
    Mafish@gmail.com


    --
    Yours,
    Zheng
  • Amr Awadallah at Feb 25, 2010 at 9:26 am
    +1, please post jira/patch.

    -- amr
    On 2/25/2010 1:20 AM, Zheng Shao wrote:
    Yes definitely. Do you want to open a JIRA and post a patch?
    Please link the new JIRA to the other 2 JIRA that was mentioned in the
    same email thread.

    Zheng

    On Thu, Feb 25, 2010 at 1:16 AM, Mafish Liuwrote:
    Hive does not support multi-distinct in one query.

    We have implemented multi-distinct based on hive 0.4.2rc to our demand.
    We don't know that if Hive is intresting in this feature.

    2010/2/25 Jeff Zhang<zjffdu@gmail.com>:
    Hi all,

    I read the tutorial of Hive, and it says that "no two aggregations can have
    different DISTINCT columns". Could anyone tell what is the reason ? Does the
    following Distinct will been translate to map-reduce job or just do it
    locally ?

    INSERT OVERWRITE TABLE pv_gender_agg
    SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT
    pv_users.ip)
    FROM pv_users
    GROUP BY pv_users.gender;

    --
    Best Regards

    Jeff Zhang

    --
    Mafish@gmail.com
  • Mafish Liu at Feb 25, 2010 at 10:12 am

    2010/2/25 Zheng Shao <zshao9@gmail.com>:
    Yes definitely. Do you want to open a JIRA and post a patch?
    Please link the new JIRA to the other 2 JIRA that was mentioned in the
    same email thread.
    I'll open a jira.
    And the patch will be post after code and documents being arranged.
    Zheng
    On Thu, Feb 25, 2010 at 1:16 AM, Mafish Liu wrote:
    Hive does not support multi-distinct in one query.

    We have implemented multi-distinct based on hive 0.4.2rc to our demand.
    We don't know that if Hive is intresting in this feature.

    2010/2/25 Jeff Zhang <zjffdu@gmail.com>:
    Hi all,

    I read the tutorial of Hive, and it says that "no two aggregations can have
    different DISTINCT columns". Could anyone tell what is the reason ? Does the
    following Distinct will been translate to map-reduce job or just do it
    locally ?

    INSERT OVERWRITE TABLE pv_gender_agg
    SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT
    pv_users.ip)
    FROM pv_users
    GROUP BY pv_users.gender;

    --
    Best Regards

    Jeff Zhang


    --
    Mafish@gmail.com


    --
    Yours,
    Zheng


    --
    Mafish@gmail.com
  • Todd Lipcon at Feb 25, 2010 at 3:46 pm
    I think you can use this existing JIRA:

    http://issues.apache.org/jira/browse/HIVE-474

    Thanks
    -Todd
    On Thu, Feb 25, 2010 at 2:11 AM, Mafish Liu wrote:

    2010/2/25 Zheng Shao <zshao9@gmail.com>:
    Yes definitely. Do you want to open a JIRA and post a patch?
    Please link the new JIRA to the other 2 JIRA that was mentioned in the
    same email thread.
    I'll open a jira.
    And the patch will be post after code and documents being arranged.
    Zheng
    On Thu, Feb 25, 2010 at 1:16 AM, Mafish Liu wrote:
    Hive does not support multi-distinct in one query.

    We have implemented multi-distinct based on hive 0.4.2rc to our demand.
    We don't know that if Hive is intresting in this feature.

    2010/2/25 Jeff Zhang <zjffdu@gmail.com>:
    Hi all,

    I read the tutorial of Hive, and it says that "no two aggregations can
    have
    different DISTINCT columns". Could anyone tell what is the reason ?
    Does the
    following Distinct will been translate to map-reduce job or just do it
    locally ?

    INSERT OVERWRITE TABLE pv_gender_agg
    SELECT pv_users.gender, count(DISTINCT pv_users.userid),
    count(DISTINCT
    pv_users.ip)
    FROM pv_users
    GROUP BY pv_users.gender;

    --
    Best Regards

    Jeff Zhang


    --
    Mafish@gmail.com


    --
    Yours,
    Zheng


    --
    Mafish@gmail.com
  • Mafish Liu at Feb 26, 2010 at 4:06 am

    2010/2/25 Todd Lipcon <todd@cloudera.com>:
    I think you can use this existing JIRA:
    http://issues.apache.org/jira/browse/HIVE-474
    I'm using this JIRA. Thanks.
    Thanks
    -Todd
    On Thu, Feb 25, 2010 at 2:11 AM, Mafish Liu wrote:

    2010/2/25 Zheng Shao <zshao9@gmail.com>:
    Yes definitely. Do you want to open a JIRA and post a patch?
    Please link the new JIRA to the other 2 JIRA that was mentioned in the
    same email thread.
    I'll open a jira.
    And the patch will be post after code and documents  being arranged.
    Zheng
    On Thu, Feb 25, 2010 at 1:16 AM, Mafish Liu wrote:
    Hive does not support multi-distinct in one query.

    We have implemented multi-distinct based on hive 0.4.2rc to our demand.
    We don't know that if Hive is intresting in this feature.

    2010/2/25 Jeff Zhang <zjffdu@gmail.com>:
    Hi all,

    I read the tutorial of Hive, and it says that "no two aggregations can
    have
    different DISTINCT columns". Could anyone tell what is the reason ?
    Does the
    following Distinct will been translate to map-reduce job or just do it
    locally ?

    INSERT OVERWRITE TABLE pv_gender_agg
    SELECT pv_users.gender, count(DISTINCT pv_users.userid),
    count(DISTINCT
    pv_users.ip)
    FROM pv_users
    GROUP BY pv_users.gender;

    --
    Best Regards

    Jeff Zhang


    --
    Mafish@gmail.com


    --
    Yours,
    Zheng


    --
    Mafish@gmail.com


    --
    Mafish@gmail.com
  • Mafish Liu at Mar 30, 2010 at 9:23 am
    Patch uploaded.
    Please have a review at https://issues.apache.org/jira/browse/HIVE-474

    2010/2/26 Mafish Liu <mafish@gmail.com>:
    2010/2/25 Todd Lipcon <todd@cloudera.com>:
    I think you can use this existing JIRA:
    http://issues.apache.org/jira/browse/HIVE-474
    I'm using this JIRA. Thanks.
    Thanks
    -Todd
    On Thu, Feb 25, 2010 at 2:11 AM, Mafish Liu wrote:

    2010/2/25 Zheng Shao <zshao9@gmail.com>:
    Yes definitely. Do you want to open a JIRA and post a patch?
    Please link the new JIRA to the other 2 JIRA that was mentioned in the
    same email thread.
    I'll open a jira.
    And the patch will be post after code and documents  being arranged.
    Zheng
    On Thu, Feb 25, 2010 at 1:16 AM, Mafish Liu wrote:
    Hive does not support multi-distinct in one query.

    We have implemented multi-distinct based on hive 0.4.2rc to our demand.
    We don't know that if Hive is intresting in this feature.

    2010/2/25 Jeff Zhang <zjffdu@gmail.com>:
    Hi all,

    I read the tutorial of Hive, and it says that "no two aggregations can
    have
    different DISTINCT columns". Could anyone tell what is the reason ?
    Does the
    following Distinct will been translate to map-reduce job or just do it
    locally ?

    INSERT OVERWRITE TABLE pv_gender_agg
    SELECT pv_users.gender, count(DISTINCT pv_users.userid),
    count(DISTINCT
    pv_users.ip)
    FROM pv_users
    GROUP BY pv_users.gender;

    --
    Best Regards

    Jeff Zhang


    --
    Mafish@gmail.com


    --
    Yours,
    Zheng


    --
    Mafish@gmail.com


    --
    Mafish@gmail.com


    --
    Mafish@gmail.com
  • Mafish Liu at Feb 25, 2010 at 9:23 am
    here are our result of multi-distinct:

    hive> describe classes;
    OK
    name string
    number string
    class string
    Time taken: 0.122 seconds
    hive> select * from classes;
    OK
    1 11 8
    2 22 12
    4 212 2
    5 232 23
    6 22 2
    7 22 2
    3 333 13
    3 33 3
    4 133 32
    5 33 3
    Time taken: 0.154 seconds

    hive> select count(distinct name), count(distinct number), class from
    classes group by class;
    ....
    1 1 12
    1 1 13
    3 2 2
    1 1 23
    2 1 3
    1 1 32
    1 1 8


    2010/2/25 Mafish Liu <mafish@gmail.com>:
    Hive does not support multi-distinct in one query.

    We have implemented multi-distinct based on hive 0.4.2rc to our demand.
    We don't know that if Hive is intresting in this feature.

    2010/2/25 Jeff Zhang <zjffdu@gmail.com>:
    Hi all,

    I read the tutorial of Hive, and it says that "no two aggregations can have
    different DISTINCT columns". Could anyone tell what is the reason ? Does the
    following Distinct will been translate to map-reduce job or just do it
    locally ?

    INSERT OVERWRITE TABLE pv_gender_agg
    SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT
    pv_users.ip)
    FROM pv_users
    GROUP BY pv_users.gender;

    --
    Best Regards

    Jeff Zhang


    --
    Mafish@gmail.com


    --
    Mafish@gmail.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedFeb 25, '10 at 9:01a
activeMar 30, '10 at 9:23a
posts10
users5
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase