Grokbase Groups Hive user May 2011
Hi Guys

I use flume to store log file , and use hive to query.

Flume always store the small file with suffix .seq Now I have over 35
thousand seq files. Every time when I launch query script, 35 thousand map
tasks will be created and it's so long time to wait for completing.

I also try to set CombineHiveInputFormat, but if I set this option, it seems
the task will be executed slowly. Because total size of the data folder over
700M. Now in my testing env, I only have 3 data nodes. I also tried to add after the CombineHiveInputFormat setting, seems doesn't
work. There's alway only one map task if set CombineHiveInputFormat.

Can you plz show me a solution in which I can set map task number freely

BTW: version for hadoop is 20 and hive is 0.5


Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 4 | next ›
Discussion Overview
groupuser @
categorieshive, hadoop
postedMay 31, '11 at 9:56a
activeJun 1, '11 at 7:38p



site design / logo © 2021 Grokbase