Grokbase Groups Hive user March 2011
FAQ
Hi,

Quick note on #3. In order to make mapred.reduce.tasksperslot work,
you need to completely remove all mentions of mapred.reduce.tasks from
your configuration (including removing it from the default config
file). Tasksperslot only takes effect as a last resort.

Andrew
On Wed, Mar 9, 2011 at 11:50 AM, Igor Tatarinov wrote:
I understand that Hive and Hadoop are meant to run many jobs at once. As a
result, most tuning parameters are meant to increase the throughput of a
Hadoop cluster rather than latency. In our case, we use Elastic Map Reduce
to run a single Hive script on a daily basis. For that reason, our top
priority is to make the script run faster. So far, it's been a pretty
frustrating experience. I am curious if there are workarounds for the things
that are not easy to tune:
1) In particular, Hadoop lets you
configure mapred.tasktracker.map/reduce.tasks.maximum individually but there
is no way to limit the total of the two. Hive mappers seem to always finish
before the reducers and I wish I could run 1 more reducer when no mappers
are running at the same time. That doesn't seem to be possible.
2) Similarly, there is only one parameter to control memory
allocation: mapred.child.java.opts. So if my box is configured for 4 mappers
and 2 reducers, I have to set that parameter to less than 1/6 of total
memory available. The only problem is that once the mappers are done, 4/6th
or two thirds of all memory is essentially not being used. Is there
something I can do about that?
3) Another odd thing is not being able to run a single wave of reducers
easily. As I understand that's the optimal scenario in most cases. To make
this work, I have to know the total number of reducer slots in the cluster
and then define mapred.reduce.tasks accordingly. EMR seems to have a
solution for this problem (mapred.reduce.tasksperslot) but it doesn't seem
to work.
Any suggestions would be greatly appreciated!
Thank you,
igor

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
groupuser @
categorieshive, hadoop
postedMar 9, '11 at 7:51p
activeMar 14, '11 at 9:26p
posts2
users2
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase