Grokbase Groups Hive user March 2011
FAQ
Hi,

I am testing the Hive 0.6 on parts of my data set. It's only a couple GB of
log files that I am reading through a custom SerDe. The table is
partitionned. I am using Hadoop local mode for testing.

When I run simple Group By queries (4 MR jobs), I am getting logs such as

- map : 100%
- reduce : 0%
- map : 85%
- reduce : 0%
- map : 86%
- reduce : 0%

all the while only using one core on an 8 core server. Kind of a waste...

I have activated the parallel option but it still won't parallelize. I have
set the number of reduce jobs to be 8.

My expectations is that since my data set is partitionned (=> different
files), at least some of the map-reduce phases could be run on parallel on
those files.

Is my understanding wrong ? Is there a specific way to write the queries ?

Thanks
Philippe

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMar 8, '11 at 11:01p
activeMar 8, '11 at 11:01p
posts1
users1
websitehive.apache.org

1 user in discussion

Philippe Girolami: 1 post

People

Translate

site design / logo © 2021 Grokbase