Grokbase Groups Hive user March 2011

I am testing the Hive 0.6 on parts of my data set. It's only a couple GB of
log files that I am reading through a custom SerDe. The table is
partitionned. I am using Hadoop local mode for testing.

When I run simple Group By queries (4 MR jobs), I am getting logs such as

- map : 100%
- reduce : 0%
- map : 85%
- reduce : 0%
- map : 86%
- reduce : 0%

all the while only using one core on an 8 core server. Kind of a waste...

I have activated the parallel option but it still won't parallelize. I have
set the number of reduce jobs to be 8.

My expectations is that since my data set is partitionned (=> different
files), at least some of the map-reduce phases could be run on parallel on
those files.

Is my understanding wrong ? Is there a specific way to write the queries ?


Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 1 | next ›
Discussion Overview
groupuser @
categorieshive, hadoop
postedMar 8, '11 at 11:01p
activeMar 8, '11 at 11:01p

1 user in discussion

Philippe Girolami: 1 post



site design / logo © 2021 Grokbase