I have a 10 node server or so, and have been mainly using pig on it, but
would like to try out Hive.
I am running this query, which doesn't take too long in Pig, but is taking
quite a long time in Hive.
hive -e "select count(1) as ct from my_table where v1='02' and v2 =
11112222;" > thecount
One thing is that this job only uses 1 reducer, but it is taking most of its
time in its reduce step. I tried manually setting more reducers, but I think
that for a job without groups, it forces 1 reducer?
Either way, would love to know why this is dragging? It's worth noting that
my_table is not saved in the Hive format, but rather as a flat file. I
realize that this can influence performance, but shouldn't it at least
perform on par with pig?
Thanks for your help