The documentation appears to state the following:
"Multiple aggregations can be done at the same time, however, no two aggregations can have different DISTINCT columns .e.g while the following is possible "
" INSERT OVERWRITE TABLE pv_gender_agg
SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(1), sum(DISTINCT pv_users.userid)
GROUP BY pv_users.gender;
However, the following query is not allowed. We don't allow multiple DISTINCT expressions in the same query.
INSERT OVERWRITE TABLE pv_gender_agg
SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT pv_users.ip)
GROUP BY pv_users.gender;"http://wiki.apache.org/hadoop/Hive/LanguageManual/GroupBy
Is there an effort underway to allow multiple DISTINCT expressions in the same query in the (near) future as well?
Thanks & regards,
From: Amr Awadallah
Sent: Saturday, July 04, 2009 12:02 AM
Subject: Re: aggregations over multiple columns?
This is a valid query, group by over multiple columns works in hive.
Michael E. Driscoll wrote:
I'm trying to perform the following aggregation query in HIVE, which
finds the largest purchase for all combinations of customer and store:
SELECT customer, store, max(purchasePrice)
GROUP BY customer, store
If aggregation over multiple columns is not currently supported, how
might I reformulate this to work in HIVE, possibly via a simpler
series of queries?
(I will post the exact error and reproducible code if it turns out
this query is valid).