Hello all,

Very newto Hive (haven't even installed it yet!), but I had a use case that
I didn't see demonstrated in any of the tutorial/documentation that I've
read thus far.

Let's say that I have apache logs that I want to process with Hadoop/Hive.
Of course there may be different types of log records all tying back to the
same user or IP address or other log attribute. Is there a way to submit a
SINGLE Hive query to get back results that may look like:

IP Action1Count Action2Count Action3Count

.. where the different actions correspond to different log events for that
IP address.

Do I have to submit 3 different Hive queries here or can I submit a single
Hive query? In a regular Java-based map/reduce job, I would have written a
custom Writable that would record counts for each of the different actions,
and submit it to the reducer using output.collect(IP, customWritable). Here
I wouldn't have to submit multiple map/reduce jobs, just 1.


Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 11 | next ›
Discussion Overview
groupuser @
categorieshive, hadoop
postedOct 10, '09 at 6:43p
activeOct 19, '09 at 6:10p



site design / logo © 2022 Grokbase