Grokbase Groups Hive user July 2009
FAQ
Hi HIVErs,

I'm trying to perform the following aggregation query in HIVE, which
finds the largest purchase for all combinations of customer and store:

SELECT customer, store, max(purchasePrice)
FROM transactions
GROUP BY customer, store

If aggregation over multiple columns is not currently supported, how
might I reformulate this to work in HIVE, possibly via a simpler
series of queries?

(I will post the exact error and reproducible code if it turns out
this query is valid).

regards,

Mike

b: www.dataspora.com/blog
t: www.twitter.com/dataspora

Search Discussions

  • Amr Awadallah at Jul 4, 2009 at 7:02 am
    Mike,

    This is a valid query, group by over multiple columns works in hive.

    -- amr

    Michael E. Driscoll wrote:
    Hi HIVErs,

    I'm trying to perform the following aggregation query in HIVE, which
    finds the largest purchase for all combinations of customer and store:

    SELECT customer, store, max(purchasePrice)
    FROM transactions
    GROUP BY customer, store

    If aggregation over multiple columns is not currently supported, how
    might I reformulate this to work in HIVE, possibly via a simpler
    series of queries?

    (I will post the exact error and reproducible code if it turns out
    this query is valid).

    regards,

    Mike

    b: www.dataspora.com/blog
    t: www.twitter.com/dataspora
  • Avram Aelony at Jul 6, 2009 at 4:18 pm
    The documentation appears to state the following:

    "Multiple aggregations can be done at the same time, however, no two aggregations can have different DISTINCT columns .e.g while the following is possible "

    " INSERT OVERWRITE TABLE pv_gender_agg
    SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(1), sum(DISTINCT pv_users.userid)
    FROM pv_users
    GROUP BY pv_users.gender;
    However, the following query is not allowed. We don't allow multiple DISTINCT expressions in the same query.

    INSERT OVERWRITE TABLE pv_gender_agg
    SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT pv_users.ip)
    FROM pv_users
    GROUP BY pv_users.gender;"


    http://wiki.apache.org/hadoop/Hive/LanguageManual/GroupBy


    Is there an effort underway to allow multiple DISTINCT expressions in the same query in the (near) future as well?

    Thanks & regards,
    Avram




    -----Original Message-----
    From: Amr Awadallah
    Sent: Saturday, July 04, 2009 12:02 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: aggregations over multiple columns?

    Mike,

    This is a valid query, group by over multiple columns works in hive.

    -- amr

    Michael E. Driscoll wrote:
    Hi HIVErs,

    I'm trying to perform the following aggregation query in HIVE, which
    finds the largest purchase for all combinations of customer and store:

    SELECT customer, store, max(purchasePrice)
    FROM transactions
    GROUP BY customer, store

    If aggregation over multiple columns is not currently supported, how
    might I reformulate this to work in HIVE, possibly via a simpler
    series of queries?

    (I will post the exact error and reproducible code if it turns out
    this query is valid).

    regards,

    Mike

    b: www.dataspora.com/blog
    t: www.twitter.com/dataspora
  • Jeff Hammerbacher at Jul 7, 2009 at 2:51 am
    Avram: see https://issues.apache.org/jira/browse/HIVE-474.
    On Mon, Jul 6, 2009 at 9:17 AM, Avram Aelony wrote:


    The documentation appears to state the following:

    "Multiple aggregations can be done at the same time, however, no two
    aggregations can have different DISTINCT columns .e.g while the following is
    possible "

    " INSERT OVERWRITE TABLE pv_gender_agg
    SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(1),
    sum(DISTINCT pv_users.userid)
    FROM pv_users
    GROUP BY pv_users.gender;
    However, the following query is not allowed. We don't allow multiple
    DISTINCT expressions in the same query.

    INSERT OVERWRITE TABLE pv_gender_agg
    SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT
    pv_users.ip)
    FROM pv_users
    GROUP BY pv_users.gender;"


    http://wiki.apache.org/hadoop/Hive/LanguageManual/GroupBy


    Is there an effort underway to allow multiple DISTINCT expressions in the
    same query in the (near) future as well?

    Thanks & regards,
    Avram




    -----Original Message-----
    From: Amr Awadallah
    Sent: Saturday, July 04, 2009 12:02 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: aggregations over multiple columns?

    Mike,

    This is a valid query, group by over multiple columns works in hive.

    -- amr

    Michael E. Driscoll wrote:
    Hi HIVErs,

    I'm trying to perform the following aggregation query in HIVE, which
    finds the largest purchase for all combinations of customer and store:

    SELECT customer, store, max(purchasePrice)
    FROM transactions
    GROUP BY customer, store

    If aggregation over multiple columns is not currently supported, how
    might I reformulate this to work in HIVE, possibly via a simpler
    series of queries?

    (I will post the exact error and reproducible code if it turns out
    this query is valid).

    regards,

    Mike

    b: www.dataspora.com/blog
    t: www.twitter.com/dataspora
  • Ashish Thusoo at Jul 6, 2009 at 7:54 pm
    Hi Michael,

    This is supported and should work.

    Ashish

    -----Original Message-----
    From: Michael E. Driscoll
    Sent: Friday, July 03, 2009 2:35 AM
    To: hive-user@hadoop.apache.org
    Subject: aggregations over multiple columns?

    Hi HIVErs,

    I'm trying to perform the following aggregation query in HIVE, which finds the largest purchase for all combinations of customer and store:

    SELECT customer, store, max(purchasePrice)
    FROM transactions
    GROUP BY customer, store

    If aggregation over multiple columns is not currently supported, how might I reformulate this to work in HIVE, possibly via a simpler series of queries?

    (I will post the exact error and reproducible code if it turns out this query is valid).

    regards,

    Mike

    b: www.dataspora.com/blog
    t: www.twitter.com/dataspora

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 3, '09 at 9:35a
activeJul 7, '09 at 2:51a
posts5
users5
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase