FAQ
Hi,

I'm currently doing some testing of different configurations using the
Hadoop Sort as follows,

bin/hadoop jar hadoop-*-examples.jar randomwriter
-Dtest.randomwrite.total_bytes=107374182400 /benchmark100

bin/hadoop jar hadoop-*-examples.jar sort /benchmark100 rand-sort

The only changes I've made from the standard config are the following in
conf/mapred-site.xml

<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024M</value>
</property>

<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>8</value>
</property>

<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>4</value>
</property>

I'm running this on 4 systems, each with 8 processor cores and 4
separate disks.

Is there anything else I should change to stress memory more? The
systems in questions have 16GB of memory but the most thats getting used
during a run of this benchmark is about 2GB (and most of that seems to
be os caching).

Thanks,

-stephen

--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie http://webstar.deri.ie http://sindice.com

Search Discussions

  • Aaron Kimball at Jun 10, 2009 at 4:14 pm
    Hi Stephen,

    That will set the maximum heap allowable, but doesn't tell Hadoop's internal
    systems necessarily to take advantage of it. There's a number of other
    settings that adjust performance. At Cloudera we have a config tool that
    generates Hadoop configurations with reasonable first-approximation values
    for your cluster -- check out http://my.cloudera.com and look at the
    hadoop-site.xml it generates. If you start from there you might find a
    better parameter space to explore. Please share back your findings -- we'd
    love to tweak the tool even more with some external feedback :)

    - Aaron


    On Wed, Jun 10, 2009 at 7:39 AM, stephen mulcahy
    wrote:
    Hi,

    I'm currently doing some testing of different configurations using the
    Hadoop Sort as follows,

    bin/hadoop jar hadoop-*-examples.jar randomwriter
    -Dtest.randomwrite.total_bytes=107374182400 /benchmark100

    bin/hadoop jar hadoop-*-examples.jar sort /benchmark100 rand-sort

    The only changes I've made from the standard config are the following in
    conf/mapred-site.xml

    <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1024M</value>
    </property>

    <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>8</value>
    </property>

    <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>4</value>
    </property>

    I'm running this on 4 systems, each with 8 processor cores and 4 separate
    disks.

    Is there anything else I should change to stress memory more? The systems
    in questions have 16GB of memory but the most thats getting used during a
    run of this benchmark is about 2GB (and most of that seems to be os
    caching).

    Thanks,

    -stephen

    --
    Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
    NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
    http://di2.deri.ie http://webstar.deri.ie http://sindice.com
  • Owen O'Malley at Jun 10, 2009 at 7:15 pm
    Take a look at Arun's slide deck on Hadoop performance:

    http://bit.ly/EDCg3

    It is important to get io.sort.mb large enough, the io.sort.factor
    should be closer to 100 instead of 10. I'd also use large block sizes
    to reduce the number of maps. Please see the deck for other important
    factors.

    -- Owen
  • Matei Zaharia at Jun 11, 2009 at 9:47 am
    Owen, one problem with Arun's slide deck is that while it lists the
    parameters that matter, it doesn't list suggested values for them. Do you
    have any guide about that? In particular, the only places I know that talk
    about how to set these parameters are
    http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/and
    http://wiki.apache.org/hadoop/FAQ#3.
    On Wed, Jun 10, 2009 at 12:14 PM, Owen O'Malley wrote:

    Take a look at Arun's slide deck on Hadoop performance:

    http://bit.ly/EDCg3

    It is important to get io.sort.mb large enough, the io.sort.factor should
    be closer to 100 instead of 10. I'd also use large block sizes to reduce the
    number of maps. Please see the deck for other important factors.

    -- Owen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 10, '09 at 2:51p
activeJun 11, '09 at 9:47a
posts4
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase