On Mon, Jun 28, 2010 at 9:07 AM, Scott Whitecross wrote:
I would like to control the maximum number of reducers a Hive query has
access to. I have seen cases of Hive using up to 999 reducers, which seems
inefficient (starting and stopping individual reducers), and I'd also like
to cap the resources Hive uses on the cluster. (Investigating the fair use
scheduler as well, which hopefully works well?)
I haven't seen any conclusive settings is the documentation, so what options
are there to throttle Hive (using .4 at the moment)? hive.exec.reducers.max
is mentioned in a JIRA item, but not in Hive documentation. Does it work?
Thanks.
Scott,
Where did you find hive4 in someone's attic? Just kidding. It is not
that old in terms of years but it is older in terms of releases.
Upgrade to latest release 5.1, 6.0 is nearing release as well so you
may want to wait or run trunk.
You can see all the available configuration options in hive-default.xml
Here are the relevant values. (some/most of these features may not be in hive 4)
<property>
<name>hive.exec.reducers.bytes.per.reducer</name>
<value>1000000000</value>
<description>size per reducer.The default is 1G, i.e if the input
size is 10G, it will use 10 reducers.</description>
</property>
<property>
<name>hive.exec.reducers.max</name>
<value>999</value>
<description>max number of reducers will be used. If the one
specified in the configuration parameter mapred.reduce.tasks is
negative, hive will use this one as the max number of reducers when
automatically determine number of reducers.</description>
</property>
<property>
<name>hive.merge.size.per.task</name>
<value>256000000</value>
<description>Size of merged files at the end of the job</description>
</property>
<property>
<name>hive.merge.size.smallfiles.avgsize</name>
<value>16000000</value>
<description>When the average output file size of a job is less than
this number, Hive will start an additional map-reduce job to merge the
output files into bigger files. This is only done for map-only jobs
if hive.merge.mapfiles is true, and for map-reduce jobs if
hive.merge.mapredfiles is true.</description>
</property>
<property>
<name>hive.merge.mapfiles</name>
<value>true</value>
<description>Merge small files at the end of a map-only job</description>
</property>
<property>
<name>hive.merge.mapredfiles</name>
<value>false</value>
<description>Merge small files at the end of a map-reduce job</description>
</property>
Other performance related settings
<property>
<name>hive.exec.compress.intermediate</name>
<value>false</value>
<description> This controls whether intermediate files produced by
hive between multiple map-reduce jobs are compressed. The compression
codec and other options are determined from hadoop config variables
mapred.output.compress* </description>
</property>
<property>
<name>hive.exec.parallel</name>
<value>false</value>
<description>Whether to execute jobs in parallel</description>
</property>
Regards,
Edward