Grokbase Groups Hive user June 2011
FAQ
Hello,



What's the plan to support fully aggregated lists reading a table in
order? (see below)



I have a fairly complex (45 line) SELECT script in Hive with Joins,
Unions, etc. to which I have to add a list of aggregated values from a
field.

Data aside, I'm using collect_set to build a de-duped list of those
values. But I need the duplicates.



I've posted here on stack overflow (with a +50 bounty):

http://stackoverflow.com/questions/6445339/collect-set-in-hive-keep-dupl
icates

No hits.



... would I need to edit the original collect_set JAVA file and make my
own function? Or could I use a python script TRANSFORM()?



I'm aware of, but not entirely up to editing, the collect_set file:

https://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoo
p/hive/ql/udf/generic/GenericUDAFCollectSet.java



Thanks!



Travis Powell



Travis Powell / tpowell@tealeaf.com

Tealeaf Technology / http://www.tealeaf.com

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJun 29, '11 at 8:29p
activeJun 29, '11 at 8:29p
posts1
users1
websitehive.apache.org

1 user in discussion

Travis Powell: 1 post

People

Translate

site design / logo © 2021 Grokbase