FAQ
Hi,

Is there a way to tell what kind of performance numbers one can expect
out of their cluster given a certain set of specs.

For example i have 5 nodes in my cluster that all have the following
hardware configuration(s):
Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack.

Thanks,
Usman

Search Discussions

  • Tim robertson at Oct 14, 2009 at 9:02 am
    Might it be worth running the http://wiki.apache.org/hadoop/Sort and
    posting your results for comment?

    Tim

    On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed wrote:
    Hi,

    Is there a way to tell what kind of performance numbers one can expect out
    of their cluster given a certain set of specs.

    For example i have 5 nodes in my cluster that all have the following
    hardware configuration(s):
    Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack.

    Thanks,
    Usman
  • Usman Waheed at Oct 14, 2009 at 9:41 am
    Thanks Tim, i will check it out and post my results for comments.
    -Usman
    Might it be worth running the http://wiki.apache.org/hadoop/Sort and
    posting your results for comment?

    Tim

    On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed wrote:

    Hi,

    Is there a way to tell what kind of performance numbers one can expect out
    of their cluster given a certain set of specs.

    For example i have 5 nodes in my cluster that all have the following
    hardware configuration(s):
    Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack.

    Thanks,
    Usman

  • Tim robertson at Oct 14, 2009 at 9:47 am
    I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB
    cache), 8G RAM and 2x500G drives, and will do the same soon. Got some
    issues though so it won't start up...

    Tim

    On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed wrote:
    Thanks Tim, i will check it out and post my results for comments.
    -Usman
    Might it be worth running the http://wiki.apache.org/hadoop/Sort and
    posting your results for comment?

    Tim

    On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed wrote:


    Hi,

    Is there a way to tell what kind of performance numbers one can expect
    out
    of their cluster given a certain set of specs.

    For example i have 5 nodes in my cluster that all have the following
    hardware configuration(s):
    Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack.

    Thanks,
    Usman

  • Usman Waheed at Oct 14, 2009 at 11:12 am
    Here are the results i got from my 4 node cluster (correction i noted 5
    earlier). One of my nodes out of the 4 is a namenode+datanode both.

    GENERATE RANDOM DATA
    Wrote out 40GB of random binary data:
    Map output records=4088301
    The job took 358 seconds. (approximately: 6 minutes).

    SORT RANDOM GENERATED DATA
    Map output records=4088301
    Reduce input records=4088301
    The job took 2136 seconds. (approximately: 35 minutes).

    VALIDATION OF SORTED DATA
    The job took 183 seconds.
    SUCCESS! Validated the MapReduce framework's 'sort' successfully.

    It would be interesting to see what performance numbers others with a
    similar setup have obtained.

    Thanks,
    Usman
    I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB
    cache), 8G RAM and 2x500G drives, and will do the same soon. Got some
    issues though so it won't start up...

    Tim

    On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed wrote:

    Thanks Tim, i will check it out and post my results for comments.
    -Usman
    Might it be worth running the http://wiki.apache.org/hadoop/Sort and
    posting your results for comment?

    Tim


    On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed wrote:

    Hi,

    Is there a way to tell what kind of performance numbers one can expect
    out
    of their cluster given a certain set of specs.

    For example i have 5 nodes in my cluster that all have the following
    hardware configuration(s):
    Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack.

    Thanks,
    Usman


  • Todd Lipcon at Oct 14, 2009 at 5:18 pm
    This seems a bit slow for that setup (4-5 MB/sec/node sorting). Have
    you changed the configurations at all? There are some notes on this
    blog post that might help your performance a bit:

    http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/

    How many map and reduce slots did you configure for the daemons? If
    you have Ganglia installed you can usually get a good idea of whether
    you're using your resources well by looking at the graphs while
    running a job like this sort.

    -Todd
    On Wed, Oct 14, 2009 at 4:04 AM, Usman Waheed wrote:
    Here are the results i got from my 4 node cluster (correction i noted 5
    earlier). One of my nodes out of the 4 is a namenode+datanode both.

    GENERATE RANDOM DATA
    Wrote out 40GB of random binary data:
    Map output records=4088301
    The job took 358 seconds. (approximately: 6 minutes).

    SORT RANDOM GENERATED DATA
    Map output records=4088301
    Reduce input records=4088301
    The job took 2136 seconds. (approximately: 35 minutes).

    VALIDATION OF SORTED DATA
    The job took 183 seconds.
    SUCCESS! Validated the MapReduce framework's 'sort' successfully.

    It would be interesting to see what performance numbers others with a
    similar setup have obtained.

    Thanks,
    Usman
    I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB
    cache), 8G RAM and 2x500G drives, and will do the same soon.  Got some
    issues though so it won't start up...

    Tim

    On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed wrote:


    Thanks Tim, i will check it out and post my results for comments.
    -Usman
    Might it be worth running the http://wiki.apache.org/hadoop/Sort and
    posting your results for comment?

    Tim


    On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed wrote:

    Hi,

    Is there a way to tell what kind of performance numbers one can expect
    out
    of their cluster given a certain set of specs.

    For example i have 5 nodes in my cluster that all have the following
    hardware configuration(s):
    Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack.

    Thanks,
    Usman


  • Chris Seline at Oct 14, 2009 at 5:47 pm
    Unless my calcs are off, that is right in line with the terabyte sort
    record:

    1TB = roughly 1,000,000 MB
    1460 nodes
    62 seconds
    1000000/1460/62 = 11MB/s per machine

    but they used 2.5ghz quad core DUAL cpu servers, so they have roughly
    2.5x the horsepower as Usman's setup.

    http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html

    Or am I wrong?

    Chris


    Todd Lipcon wrote:
    This seems a bit slow for that setup (4-5 MB/sec/node sorting). Have
    you changed the configurations at all? There are some notes on this
    blog post that might help your performance a bit:

    http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/

    How many map and reduce slots did you configure for the daemons? If
    you have Ganglia installed you can usually get a good idea of whether
    you're using your resources well by looking at the graphs while
    running a job like this sort.

    -Todd
    On Wed, Oct 14, 2009 at 4:04 AM, Usman Waheed wrote:

    Here are the results i got from my 4 node cluster (correction i noted 5
    earlier). One of my nodes out of the 4 is a namenode+datanode both.

    GENERATE RANDOM DATA
    Wrote out 40GB of random binary data:
    Map output records=4088301
    The job took 358 seconds. (approximately: 6 minutes).

    SORT RANDOM GENERATED DATA
    Map output records=4088301
    Reduce input records=4088301
    The job took 2136 seconds. (approximately: 35 minutes).

    VALIDATION OF SORTED DATA
    The job took 183 seconds.
    SUCCESS! Validated the MapReduce framework's 'sort' successfully.

    It would be interesting to see what performance numbers others with a
    similar setup have obtained.

    Thanks,
    Usman

    I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB
    cache), 8G RAM and 2x500G drives, and will do the same soon. Got some
    issues though so it won't start up...

    Tim


    On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed wrote:

    Thanks Tim, i will check it out and post my results for comments.
    -Usman

    Might it be worth running the http://wiki.apache.org/hadoop/Sort and
    posting your results for comment?

    Tim


    On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed wrote:


    Hi,

    Is there a way to tell what kind of performance numbers one can expect
    out
    of their cluster given a certain set of specs.

    For example i have 5 nodes in my cluster that all have the following
    hardware configuration(s):
    Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack.

    Thanks,
    Usman



  • Usman Waheed at Oct 15, 2009 at 9:33 am
    Hi Todd,

    Some changes have been applied to the cluster based on the documentation
    (URL) you noted below,
    like file descriptor settings and io.file.buffer.size. I will check out
    the other settings which I haven't applied yet.

    My map/reduce slot settings from my hadoop-site.xml and
    hadoop-default.xml on all nodes in the cluster.

    _*hadoop-site.xml
    *_mapred.tasktracker.task.maximum = 2
    mapred.tasktracker.map.tasks.maximum = 8
    mapred.tasktracker.reduce.tasks.maximum = 8
    _*
    hadoop-default.xml
    *_mapred.map.tasks = 2
    mapred.reduce.tasks = 1

    Thanks,
    Usman

    This seems a bit slow for that setup (4-5 MB/sec/node sorting). Have
    you changed the configurations at all? There are some notes on this
    blog post that might help your performance a bit:

    http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/

    How many map and reduce slots did you configure for the daemons? If
    you have Ganglia installed you can usually get a good idea of whether
    you're using your resources well by looking at the graphs while
    running a job like this sort.

    -Todd
    On Wed, Oct 14, 2009 at 4:04 AM, Usman Waheed wrote:

    Here are the results i got from my 4 node cluster (correction i noted 5
    earlier). One of my nodes out of the 4 is a namenode+datanode both.

    GENERATE RANDOM DATA
    Wrote out 40GB of random binary data:
    Map output records=4088301
    The job took 358 seconds. (approximately: 6 minutes).

    SORT RANDOM GENERATED DATA
    Map output records=4088301
    Reduce input records=4088301
    The job took 2136 seconds. (approximately: 35 minutes).

    VALIDATION OF SORTED DATA
    The job took 183 seconds.
    SUCCESS! Validated the MapReduce framework's 'sort' successfully.

    It would be interesting to see what performance numbers others with a
    similar setup have obtained.

    Thanks,
    Usman

    I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB
    cache), 8G RAM and 2x500G drives, and will do the same soon. Got some
    issues though so it won't start up...

    Tim


    On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed wrote:

    Thanks Tim, i will check it out and post my results for comments.
    -Usman

    Might it be worth running the http://wiki.apache.org/hadoop/Sort and
    posting your results for comment?

    Tim


    On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed wrote:


    Hi,

    Is there a way to tell what kind of performance numbers one can expect
    out
    of their cluster given a certain set of specs.

    For example i have 5 nodes in my cluster that all have the following
    hardware configuration(s):
    Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack.

    Thanks,
    Usman



  • Tim robertson at Oct 15, 2009 at 3:34 pm
    Hi Usmam,

    So on my 10 node cluster (9 DN) with 4 maps and 4 reduces (I plan on
    high memory jobs so picked 4 only)
    [9 DN of Dell R300: 2.83G Quadcore (2x6MB cache), 8G RAM and 2x500G SATA drives]

    Using your template for stats, I get the following with no tuning:

    GENERATE RANDOM DATA
    Wrote out 90GB of random binary data:
    Map output records=9198009
    The job took 350 seconds. (approximately: 6 minutes)

    SORT RANDOM GENERATED DATA
    Map output records= 9197821
    Reduce input records=9197821
    The job took 2176 seconds. (approximately: 36mins).

    So pretty similar to your initial benchmark. I will tune it a bit
    tomorrow and rerun.

    If you spent time tuning your cluster and it was successful, please
    can you share your config?

    Cheers,
    Tim




    On Thu, Oct 15, 2009 at 11:32 AM, Usman Waheed wrote:
    Hi Todd,

    Some changes have been applied to the cluster based on the documentation
    (URL) you noted below,
    like file descriptor settings and io.file.buffer.size. I will check out the
    other settings which I haven't applied yet.

    My map/reduce slot settings from my hadoop-site.xml and hadoop-default.xml
    on all nodes in the cluster.

    _*hadoop-site.xml
    *_mapred.tasktracker.task.maximum = 2
    mapred.tasktracker.map.tasks.maximum = 8
    mapred.tasktracker.reduce.tasks.maximum = 8
    _*
    hadoop-default.xml
    *_mapred.map.tasks = 2
    mapred.reduce.tasks = 1

    Thanks,
    Usman

    This seems a bit slow for that setup (4-5 MB/sec/node sorting). Have
    you changed the configurations at all? There are some notes on this
    blog post that might help your performance a bit:


    http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/

    How many map and reduce slots did you configure for the daemons? If
    you have Ganglia installed you can usually get a good idea of whether
    you're using your resources well by looking at the graphs while
    running a job like this sort.

    -Todd
    On Wed, Oct 14, 2009 at 4:04 AM, Usman Waheed wrote:


    Here are the results i got from my 4 node cluster (correction i noted 5
    earlier). One of my nodes out of the 4 is a namenode+datanode both.

    GENERATE RANDOM DATA
    Wrote out 40GB of random binary data:
    Map output records=4088301
    The job took 358 seconds. (approximately: 6 minutes).

    SORT RANDOM GENERATED DATA
    Map output records=4088301
    Reduce input records=4088301
    The job took 2136 seconds. (approximately: 35 minutes).

    VALIDATION OF SORTED DATA
    The job took 183 seconds.
    SUCCESS! Validated the MapReduce framework's 'sort' successfully.

    It would be interesting to see what performance numbers others with a
    similar setup have obtained.

    Thanks,
    Usman

    I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB
    cache), 8G RAM and 2x500G drives, and will do the same soon.  Got some
    issues though so it won't start up...

    Tim


    On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed wrote:

    Thanks Tim, i will check it out and post my results for comments.
    -Usman

    Might it be worth running the http://wiki.apache.org/hadoop/Sort and
    posting your results for comment?

    Tim


    On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed <usmanw@opera.com>
    wrote:


    Hi,

    Is there a way to tell what kind of performance numbers one can
    expect
    out
    of their cluster given a certain set of specs.

    For example i have 5 nodes in my cluster that all have the following
    hardware configuration(s):
    Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack.

    Thanks,
    Usman



  • Usman Waheed at Oct 15, 2009 at 3:38 pm
    Hi Tim,

    Thanks much for sharing the info.
    I will most certainly share my configuration settings after applying
    some tuning at my end.
    Will let you know the results on this email list.

    Thanks,
    Usman

    Hi Usmam,

    So on my 10 node cluster (9 DN) with 4 maps and 4 reduces (I plan on
    high memory jobs so picked 4 only)
    [9 DN of Dell R300: 2.83G Quadcore (2x6MB cache), 8G RAM and 2x500G SATA drives]

    Using your template for stats, I get the following with no tuning:

    GENERATE RANDOM DATA
    Wrote out 90GB of random binary data:
    Map output records=9198009
    The job took 350 seconds. (approximately: 6 minutes)

    SORT RANDOM GENERATED DATA
    Map output records= 9197821
    Reduce input records=9197821
    The job took 2176 seconds. (approximately: 36mins).

    So pretty similar to your initial benchmark. I will tune it a bit
    tomorrow and rerun.

    If you spent time tuning your cluster and it was successful, please
    can you share your config?

    Cheers,
    Tim




    On Thu, Oct 15, 2009 at 11:32 AM, Usman Waheed wrote:

    Hi Todd,

    Some changes have been applied to the cluster based on the documentation
    (URL) you noted below,
    like file descriptor settings and io.file.buffer.size. I will check out the
    other settings which I haven't applied yet.

    My map/reduce slot settings from my hadoop-site.xml and hadoop-default.xml
    on all nodes in the cluster.

    _*hadoop-site.xml
    *_mapred.tasktracker.task.maximum = 2
    mapred.tasktracker.map.tasks.maximum = 8
    mapred.tasktracker.reduce.tasks.maximum = 8
    _*
    hadoop-default.xml
    *_mapred.map.tasks = 2
    mapred.reduce.tasks = 1

    Thanks,
    Usman


    This seems a bit slow for that setup (4-5 MB/sec/node sorting). Have
    you changed the configurations at all? There are some notes on this
    blog post that might help your performance a bit:


    http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/

    How many map and reduce slots did you configure for the daemons? If
    you have Ganglia installed you can usually get a good idea of whether
    you're using your resources well by looking at the graphs while
    running a job like this sort.

    -Todd

    On Wed, Oct 14, 2009 at 4:04 AM, Usman Waheed wrote:

    Here are the results i got from my 4 node cluster (correction i noted 5
    earlier). One of my nodes out of the 4 is a namenode+datanode both.

    GENERATE RANDOM DATA
    Wrote out 40GB of random binary data:
    Map output records=4088301
    The job took 358 seconds. (approximately: 6 minutes).

    SORT RANDOM GENERATED DATA
    Map output records=4088301
    Reduce input records=4088301
    The job took 2136 seconds. (approximately: 35 minutes).

    VALIDATION OF SORTED DATA
    The job took 183 seconds.
    SUCCESS! Validated the MapReduce framework's 'sort' successfully.

    It would be interesting to see what performance numbers others with a
    similar setup have obtained.

    Thanks,
    Usman


    I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB
    cache), 8G RAM and 2x500G drives, and will do the same soon. Got some
    issues though so it won't start up...

    Tim


    On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed wrote:


    Thanks Tim, i will check it out and post my results for comments.
    -Usman


    Might it be worth running the http://wiki.apache.org/hadoop/Sort and
    posting your results for comment?

    Tim


    On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed <usmanw@opera.com>
    wrote:



    Hi,

    Is there a way to tell what kind of performance numbers one can
    expect
    out
    of their cluster given a certain set of specs.

    For example i have 5 nodes in my cluster that all have the following
    hardware configuration(s):
    Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack.

    Thanks,
    Usman




  • Patrick Angeles at Oct 15, 2009 at 3:52 pm
    Hi Tim,
    I assume those are single proc machines?

    I got 649 secs on 70GB of data for our 7-node cluster (~11 mins), but we
    have dual quad Nehalems (2.26Ghz).

    On Thu, Oct 15, 2009 at 11:34 AM, tim robertson
    wrote:
    Hi Usmam,

    So on my 10 node cluster (9 DN) with 4 maps and 4 reduces (I plan on
    high memory jobs so picked 4 only)
    [9 DN of Dell R300: 2.83G Quadcore (2x6MB cache), 8G RAM and 2x500G SATA
    drives]

    Using your template for stats, I get the following with no tuning:

    GENERATE RANDOM DATA
    Wrote out 90GB of random binary data:
    Map output records=9198009
    The job took 350 seconds. (approximately: 6 minutes)

    SORT RANDOM GENERATED DATA
    Map output records= 9197821
    Reduce input records=9197821
    The job took 2176 seconds. (approximately: 36mins).

    So pretty similar to your initial benchmark. I will tune it a bit
    tomorrow and rerun.

    If you spent time tuning your cluster and it was successful, please
    can you share your config?

    Cheers,
    Tim




    On Thu, Oct 15, 2009 at 11:32 AM, Usman Waheed wrote:
    Hi Todd,

    Some changes have been applied to the cluster based on the documentation
    (URL) you noted below,
    like file descriptor settings and io.file.buffer.size. I will check out the
    other settings which I haven't applied yet.

    My map/reduce slot settings from my hadoop-site.xml and
    hadoop-default.xml
    on all nodes in the cluster.

    _*hadoop-site.xml
    *_mapred.tasktracker.task.maximum = 2
    mapred.tasktracker.map.tasks.maximum = 8
    mapred.tasktracker.reduce.tasks.maximum = 8
    _*
    hadoop-default.xml
    *_mapred.map.tasks = 2
    mapred.reduce.tasks = 1

    Thanks,
    Usman

    This seems a bit slow for that setup (4-5 MB/sec/node sorting). Have
    you changed the configurations at all? There are some notes on this
    blog post that might help your performance a bit:

    http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/
    How many map and reduce slots did you configure for the daemons? If
    you have Ganglia installed you can usually get a good idea of whether
    you're using your resources well by looking at the graphs while
    running a job like this sort.

    -Todd
    On Wed, Oct 14, 2009 at 4:04 AM, Usman Waheed wrote:


    Here are the results i got from my 4 node cluster (correction i noted 5
    earlier). One of my nodes out of the 4 is a namenode+datanode both.

    GENERATE RANDOM DATA
    Wrote out 40GB of random binary data:
    Map output records=4088301
    The job took 358 seconds. (approximately: 6 minutes).

    SORT RANDOM GENERATED DATA
    Map output records=4088301
    Reduce input records=4088301
    The job took 2136 seconds. (approximately: 35 minutes).

    VALIDATION OF SORTED DATA
    The job took 183 seconds.
    SUCCESS! Validated the MapReduce framework's 'sort' successfully.

    It would be interesting to see what performance numbers others with a
    similar setup have obtained.

    Thanks,
    Usman

    I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB
    cache), 8G RAM and 2x500G drives, and will do the same soon. Got some
    issues though so it won't start up...

    Tim


    On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed wrote:

    Thanks Tim, i will check it out and post my results for comments.
    -Usman

    Might it be worth running the http://wiki.apache.org/hadoop/Sortand
    posting your results for comment?

    Tim


    On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed <usmanw@opera.com>
    wrote:


    Hi,

    Is there a way to tell what kind of performance numbers one can
    expect
    out
    of their cluster given a certain set of specs.

    For example i have 5 nodes in my cluster that all have the
    following
    hardware configuration(s):
    Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same
    rack.
    Thanks,
    Usman



  • Tim robertson at Oct 15, 2009 at 7:08 pm
    Yeah they are single proc machines and other than setting to 4
    map/reduces, completely 0.20.1 vanilla installation.

    I will tune it up in the morning based on what I can find on the web
    (e.g. cloudera guidelines) and post the results. I am going to be
    running HBase on top of this, but want to make sure the HDFS/MR is
    running sound before continuing.

    Seems there are a few people at the moment setting up clusters - might
    it be worth adding our config and results to
    http://wiki.apache.org/hadoop/HardwareBenchmarks ?

    For people like me (first cluster set up from scratch - previously
    used the EC2 scripts) it is nice to sanity check things look about
    right. The mailing lists suggest there are a few small clusters of
    medium spec machines springing up.

    Cheers,
    Tim




    On Thu, Oct 15, 2009 at 5:52 PM, Patrick Angeles
    wrote:
    Hi Tim,
    I assume those are single proc machines?

    I got 649 secs on 70GB of data for our 7-node cluster (~11 mins), but we
    have dual quad Nehalems (2.26Ghz).

    On Thu, Oct 15, 2009 at 11:34 AM, tim robertson
    wrote:
    Hi Usmam,

    So on my 10 node cluster (9 DN) with 4 maps and 4 reduces (I plan on
    high memory jobs so picked 4 only)
    [9 DN of Dell R300: 2.83G Quadcore (2x6MB cache), 8G RAM and 2x500G SATA
    drives]

    Using your template for stats, I get the following with no tuning:

    GENERATE RANDOM DATA
    Wrote out 90GB of random binary data:
    Map output records=9198009
    The job took 350 seconds. (approximately: 6 minutes)

    SORT RANDOM GENERATED DATA
    Map output records= 9197821
    Reduce input records=9197821
    The job took 2176 seconds. (approximately: 36mins).

    So pretty similar to your initial benchmark.  I will tune it a bit
    tomorrow and rerun.

    If you spent time tuning your cluster and it was successful, please
    can you share your config?

    Cheers,
    Tim




    On Thu, Oct 15, 2009 at 11:32 AM, Usman Waheed wrote:
    Hi Todd,

    Some changes have been applied to the cluster based on the documentation
    (URL) you noted below,
    like file descriptor settings and io.file.buffer.size. I will check out the
    other settings which I haven't applied yet.

    My map/reduce slot settings from my hadoop-site.xml and
    hadoop-default.xml
    on all nodes in the cluster.

    _*hadoop-site.xml
    *_mapred.tasktracker.task.maximum = 2
    mapred.tasktracker.map.tasks.maximum = 8
    mapred.tasktracker.reduce.tasks.maximum = 8
    _*
    hadoop-default.xml
    *_mapred.map.tasks = 2
    mapred.reduce.tasks = 1

    Thanks,
    Usman

    This seems a bit slow for that setup (4-5 MB/sec/node sorting). Have
    you changed the configurations at all? There are some notes on this
    blog post that might help your performance a bit:

    http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/
    How many map and reduce slots did you configure for the daemons? If
    you have Ganglia installed you can usually get a good idea of whether
    you're using your resources well by looking at the graphs while
    running a job like this sort.

    -Todd
    On Wed, Oct 14, 2009 at 4:04 AM, Usman Waheed wrote:


    Here are the results i got from my 4 node cluster (correction i noted 5
    earlier). One of my nodes out of the 4 is a namenode+datanode both.

    GENERATE RANDOM DATA
    Wrote out 40GB of random binary data:
    Map output records=4088301
    The job took 358 seconds. (approximately: 6 minutes).

    SORT RANDOM GENERATED DATA
    Map output records=4088301
    Reduce input records=4088301
    The job took 2136 seconds. (approximately: 35 minutes).

    VALIDATION OF SORTED DATA
    The job took 183 seconds.
    SUCCESS! Validated the MapReduce framework's 'sort' successfully.

    It would be interesting to see what performance numbers others with a
    similar setup have obtained.

    Thanks,
    Usman

    I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB
    cache), 8G RAM and 2x500G drives, and will do the same soon.  Got some
    issues though so it won't start up...

    Tim


    On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed <usmanw@opera.com>
    wrote:
    Thanks Tim, i will check it out and post my results for comments.
    -Usman

    Might it be worth running the http://wiki.apache.org/hadoop/Sortand
    posting your results for comment?

    Tim


    On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed <usmanw@opera.com>
    wrote:


    Hi,

    Is there a way to tell what kind of performance numbers one can
    expect
    out
    of their cluster given a certain set of specs.

    For example i have 5 nodes in my cluster that all have the
    following
    hardware configuration(s):
    Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same
    rack.
    Thanks,
    Usman



  • Erik Forsberg at Oct 16, 2009 at 8:00 am

    On Thu, 15 Oct 2009 11:32:35 +0200 Usman Waheed wrote:

    Hi Todd,

    Some changes have been applied to the cluster based on the
    documentation (URL) you noted below,
    I would also like to know what settings people are tuning on the
    operating system level. The blog post mentioned here does not mention
    much about that, except for the fileno changes.

    We got about 3x the read performance when running DFSIOTest by mounting
    our ext3 filesystems with the noatime parameter. I saw that mentioned
    in the slides from some Cloudera presentation.

    (For those who don't know, the noatime parameter turns off the
    recording of access time on files. That's a horrible performance killer
    since it means every read of a file also means that the kernel must do
    a write. These writes are probably queued up, but still, if you don't
    need the atime (very few applications do), turn it off!)

    Have people been experimenting with different filesystems, or are most
    of us running on top of ext3?

    How about mounting ext3 with "data=writeback"? That's rumoured to give
    the best throughput and could help with write performance. From
    mount(8):

    writeback
    Data ordering is not preserved - data may be written into the main file system
    after its metadata has been committed to the journal. This is rumoured to be the
    highest throughput option. It guarantees internal file system integrity,
    however it can allow old data to appear in files after a crash and journal recovery.

    How would the HDFS consistency checks cope with old data appearing in
    the unerlying files after a system crash?

    Cheers,
    \EF
    --
    Erik Forsberg <forsberg@opera.com>
    Developer, Opera Software - http://www.opera.com/
  • Tim robertson at Oct 16, 2009 at 11:01 am
    Hi all,

    Adding the following to core-site.xml, mapred-site.xml and
    hdfs-site.xml (based on Cloudera guidelines:
    http://tinyurl.com/ykupczu)
    io.sort.factor: 15 (mapred-site.xml)
    io.sort.mb: 150 (mapred-site.xml)
    io.file.buffer.size: 65536 (core-site.xml)
    dfs.datanode.handler.count: 3 (hdfs-site.xml actually this is the default)

    and using the default of HADOOP_HEAPSIZE=1000 (hadoop-env.sh)

    Using 2 mappers and 2 reducers, can someone please help me with the
    maths as to why my jobs are failing with "Error: Java heap space" in
    the maps?
    (the same runs fine with io.sort.factor of 10 and io.sort.mb of 100)

    io.sort.mb of 200 x 4 (2 mappers, 2 reducers) = 0.8G
    Plus the 2 daemons on the node at 1G each = 1.8G
    Plus Xmx of 1G for each hadoop daemon task = 5.8G

    The machines have 8G in them. Obviously my maths is screwy somewhere...

    Cheers,
    Tim


    On Fri, Oct 16, 2009 at 9:59 AM, Erik Forsberg wrote:
    On Thu, 15 Oct 2009 11:32:35 +0200
    Usman Waheed wrote:
    Hi Todd,

    Some changes have been applied to the cluster based on the
    documentation (URL) you noted below,
    I would also like to know what settings people are tuning on the
    operating system level. The blog post mentioned here does not mention
    much about that, except for the fileno changes.

    We got about 3x the read performance when running DFSIOTest by mounting
    our ext3 filesystems with the noatime parameter. I saw that mentioned
    in the slides from some Cloudera presentation.

    (For those who don't know, the noatime parameter turns off the
    recording of access time on files. That's a horrible performance killer
    since it means every read of a file also means that the kernel must do
    a write. These writes are probably queued up, but still, if you don't
    need the atime (very few applications do), turn it off!)

    Have people been experimenting with different filesystems, or are most
    of us running on top of ext3?

    How about mounting ext3 with "data=writeback"? That's rumoured to give
    the best throughput and could help with write performance. From
    mount(8):

    writeback
    Data ordering is not preserved - data may be written into the main file system
    after its metadata has been  committed  to the journal.  This is rumoured to be the
    highest throughput option.  It guarantees internal file system integrity,
    however it can allow old data to appear in files after a crash and journal recovery.

    How would the HDFS consistency checks cope with old data appearing in
    the unerlying files after a system crash?

    Cheers,
    \EF
    --
    Erik Forsberg <forsberg@opera.com>
    Developer, Opera Software - http://www.opera.com/
  • Todd Lipcon at Oct 16, 2009 at 3:19 pm

    On Fri, Oct 16, 2009 at 4:01 AM, tim robertson wrote:

    Hi all,

    Adding the following to core-site.xml, mapred-site.xml and
    hdfs-site.xml (based on Cloudera guidelines:
    http://tinyurl.com/ykupczu)
    io.sort.factor: 15 (mapred-site.xml)
    io.sort.mb: 150 (mapred-site.xml)
    io.file.buffer.size: 65536 (core-site.xml)
    dfs.datanode.handler.count: 3 (hdfs-site.xml actually this is the
    default)

    and using the default of HADOOP_HEAPSIZE=1000 (hadoop-env.sh)

    Using 2 mappers and 2 reducers, can someone please help me with the
    maths as to why my jobs are failing with "Error: Java heap space" in
    the maps?
    (the same runs fine with io.sort.factor of 10 and io.sort.mb of 100)

    io.sort.mb of 200 x 4 (2 mappers, 2 reducers) = 0.8G
    Plus the 2 daemons on the node at 1G each = 1.8G
    Plus Xmx of 1G for each hadoop daemon task = 5.8G

    The machines have 8G in them. Obviously my maths is screwy somewhere...
    Hi Tim,

    Did you also change mapred.child.java.opts? The HADOOP_HEAPSIZE parameter is
    for the daemons, not the tasks. If you bump up io.sort.mb you also have to
    bump up the -Xmx argument in mapred.child.java.opts to give the actual tasks
    more RAM.

    -Todd

    On Fri, Oct 16, 2009 at 9:59 AM, Erik Forsberg wrote:
    On Thu, 15 Oct 2009 11:32:35 +0200
    Usman Waheed wrote:
    Hi Todd,

    Some changes have been applied to the cluster based on the
    documentation (URL) you noted below,
    I would also like to know what settings people are tuning on the
    operating system level. The blog post mentioned here does not mention
    much about that, except for the fileno changes.

    We got about 3x the read performance when running DFSIOTest by mounting
    our ext3 filesystems with the noatime parameter. I saw that mentioned
    in the slides from some Cloudera presentation.

    (For those who don't know, the noatime parameter turns off the
    recording of access time on files. That's a horrible performance killer
    since it means every read of a file also means that the kernel must do
    a write. These writes are probably queued up, but still, if you don't
    need the atime (very few applications do), turn it off!)

    Have people been experimenting with different filesystems, or are most
    of us running on top of ext3?

    How about mounting ext3 with "data=writeback"? That's rumoured to give
    the best throughput and could help with write performance. From
    mount(8):

    writeback
    Data ordering is not preserved - data may be written into the
    main file system
    after its metadata has been committed to the journal. This
    is rumoured to be the
    highest throughput option. It guarantees internal file system
    integrity,
    however it can allow old data to appear in files after a crash
    and journal recovery.
    How would the HDFS consistency checks cope with old data appearing in
    the unerlying files after a system crash?

    Cheers,
    \EF
    --
    Erik Forsberg <forsberg@opera.com>
    Developer, Opera Software - http://www.opera.com/
  • Usman Waheed at Oct 16, 2009 at 3:31 pm
    Hi Tim,

    I have been swamped with some other stuff so did not get a chance to run
    further tests on my setup.
    Will send them out early next week so we can compare.

    Cheers,
    Usman
    On Fri, Oct 16, 2009 at 4:01 AM, tim robertson wrote:

    Hi all,

    Adding the following to core-site.xml, mapred-site.xml and
    hdfs-site.xml (based on Cloudera guidelines:
    http://tinyurl.com/ykupczu)
    io.sort.factor: 15 (mapred-site.xml)
    io.sort.mb: 150 (mapred-site.xml)
    io.file.buffer.size: 65536 (core-site.xml)
    dfs.datanode.handler.count: 3 (hdfs-site.xml actually this is the
    default)

    and using the default of HADOOP_HEAPSIZE=1000 (hadoop-env.sh)

    Using 2 mappers and 2 reducers, can someone please help me with the
    maths as to why my jobs are failing with "Error: Java heap space" in
    the maps?
    (the same runs fine with io.sort.factor of 10 and io.sort.mb of 100)

    io.sort.mb of 200 x 4 (2 mappers, 2 reducers) = 0.8G
    Plus the 2 daemons on the node at 1G each = 1.8G
    Plus Xmx of 1G for each hadoop daemon task = 5.8G

    The machines have 8G in them. Obviously my maths is screwy somewhere...

    Hi Tim,

    Did you also change mapred.child.java.opts? The HADOOP_HEAPSIZE parameter is
    for the daemons, not the tasks. If you bump up io.sort.mb you also have to
    bump up the -Xmx argument in mapred.child.java.opts to give the actual tasks
    more RAM.

    -Todd


    On Fri, Oct 16, 2009 at 9:59 AM, Erik Forsberg wrote:

    On Thu, 15 Oct 2009 11:32:35 +0200
    Usman Waheed wrote:

    Hi Todd,

    Some changes have been applied to the cluster based on the
    documentation (URL) you noted below,
    I would also like to know what settings people are tuning on the
    operating system level. The blog post mentioned here does not mention
    much about that, except for the fileno changes.

    We got about 3x the read performance when running DFSIOTest by mounting
    our ext3 filesystems with the noatime parameter. I saw that mentioned
    in the slides from some Cloudera presentation.

    (For those who don't know, the noatime parameter turns off the
    recording of access time on files. That's a horrible performance killer
    since it means every read of a file also means that the kernel must do
    a write. These writes are probably queued up, but still, if you don't
    need the atime (very few applications do), turn it off!)

    Have people been experimenting with different filesystems, or are most
    of us running on top of ext3?

    How about mounting ext3 with "data=writeback"? That's rumoured to give
    the best throughput and could help with write performance. From
    mount(8):

    writeback
    Data ordering is not preserved - data may be written into the
    main file system
    after its metadata has been committed to the journal. This
    is rumoured to be the
    highest throughput option. It guarantees internal file system
    integrity,
    however it can allow old data to appear in files after a crash
    and journal recovery.
    How would the HDFS consistency checks cope with old data appearing in
    the unerlying files after a system crash?

    Cheers,
    \EF
    --
    Erik Forsberg <forsberg@opera.com>
    Developer, Opera Software - http://www.opera.com/
  • Tim robertson at Oct 16, 2009 at 8:41 pm
    No worries Usman, I will try and do the same on Monday.

    Thanks Todd for the clarification.

    Tim

    On Fri, Oct 16, 2009 at 5:30 PM, Usman Waheed wrote:
    Hi Tim,

    I have been swamped with some other stuff so did not get a chance to run
    further tests on my setup.
    Will send them out early next week so we can compare.

    Cheers,
    Usman
    On Fri, Oct 16, 2009 at 4:01 AM, tim robertson
    wrote:

    Hi all,

    Adding the following to core-site.xml, mapred-site.xml and
    hdfs-site.xml (based on Cloudera guidelines:
    http://tinyurl.com/ykupczu)
    io.sort.factor: 15  (mapred-site.xml)
    io.sort.mb: 150  (mapred-site.xml)
    io.file.buffer.size: 65536   (core-site.xml)
    dfs.datanode.handler.count: 3 (hdfs-site.xml  actually this is the
    default)

    and using the default of HADOOP_HEAPSIZE=1000 (hadoop-env.sh)

    Using 2 mappers and 2 reducers, can someone please help me with the
    maths as to why my jobs are failing with "Error: Java heap space" in
    the maps?
    (the same runs fine with io.sort.factor of 10 and io.sort.mb of 100)

    io.sort.mb of 200 x 4 (2 mappers, 2 reducers) = 0.8G
    Plus the 2 daemons on the node at 1G each = 1.8G
    Plus Xmx of 1G for each hadoop daemon task = 5.8G

    The machines have 8G in them.  Obviously my maths is screwy somewhere...

    Hi Tim,

    Did you also change mapred.child.java.opts? The HADOOP_HEAPSIZE parameter
    is
    for the daemons, not the tasks. If you bump up io.sort.mb you also have to
    bump up the -Xmx argument in mapred.child.java.opts to give the actual
    tasks
    more RAM.

    -Todd


    On Fri, Oct 16, 2009 at 9:59 AM, Erik Forsberg <forsberg@opera.com>
    wrote:
    On Thu, 15 Oct 2009 11:32:35 +0200
    Usman Waheed wrote:

    Hi Todd,

    Some changes have been applied to the cluster based on the
    documentation (URL) you noted below,
    I would also like to know what settings people are tuning on the
    operating system level. The blog post mentioned here does not mention
    much about that, except for the fileno changes.

    We got about 3x the read performance when running DFSIOTest by mounting
    our ext3 filesystems with the noatime parameter. I saw that mentioned
    in the slides from some Cloudera presentation.

    (For those who don't know, the noatime parameter turns off the
    recording of access time on files. That's a horrible performance killer
    since it means every read of a file also means that the kernel must do
    a write. These writes are probably queued up, but still, if you don't
    need the atime (very few applications do), turn it off!)

    Have people been experimenting with different filesystems, or are most
    of us running on top of ext3?

    How about mounting ext3 with "data=writeback"? That's rumoured to give
    the best throughput and could help with write performance. From
    mount(8):

    writeback
    Data ordering is not preserved - data may be written into the
    main file system
    after its metadata has been  committed  to the journal.  This
    is rumoured to be the
    highest throughput option.  It guarantees internal file system
    integrity,
    however it can allow old data to appear in files after a crash
    and journal recovery.
    How would the HDFS consistency checks cope with old data appearing in
    the unerlying files after a system crash?

    Cheers,
    \EF
    --
    Erik Forsberg <forsberg@opera.com>
    Developer, Opera Software - http://www.opera.com/
  • Tim robertson at Oct 19, 2009 at 12:55 pm
    Hi all,

    I thought I would post the findings of my tuning tests running the
    sort benchmark.

    This is all based on 10 machines (1 as masters and 9 DN/TT) each of:
    Dell R300: 2.83G Quadcore (2x6MB cache 1 proc), 8G RAM and 2x500G SATA drives

    --- Vanilla installation ---
    2M 2R: 36 mins
    4M 4R: 36 mins (yes the same)


    --- Tuned according to Cloudera http://tinyurl.com/ykupczu ---
    io.sort.factor: 20  (mapred-site.xml)
    io.sort.mb: 200  (mapred-site.xml)
    io.file.buffer.size: 65536   (core-site.xml)
    mapred.child.java.opts: -Xmx512M  (mapred-site.xml)

    2M 2R: 33.5 mins
    4M 4R: 29 mins
    8M 8R: 41 mins


    --- Increasing the task memory a little ---
    io.sort.factor: 20
    io.sort.mb: 200
    io.file.buffer.size: 65536
    mapred.child.java.opts: -Xmx1G

    2M 2R: 29 mins (adding dfs.datanode.handler.count=8 resulted in 30 mins)
    4M 4R: 29 mins (yes the same)


    --- Increasing sort memory ---
    io.sort.factor: 32
    io.sort.mb: 320
    io.file.buffer.size: 65536
    mapred.child.java.opts: -Xmx1G

    2M 2R: 31 mins (yes longer than lower sort sizes)

    I am going to stick with the following for now and get back to work...
    io.sort.factor: 20
    io.sort.mb: 200
    io.file.buffer.size: 65536
    mapred.child.java.opts: -Xmx1G
    dfs.datanode.handler.count=8
    4 Mappers
    4 Reducer

    Hope that helps someone. How did your tuning go Usman?

    Tim


    On Fri, Oct 16, 2009 at 10:41 PM, tim robertson
    wrote:
    No worries Usman,  I will try and do the same on Monday.

    Thanks Todd for the clarification.

    Tim

    On Fri, Oct 16, 2009 at 5:30 PM, Usman Waheed wrote:
    Hi Tim,

    I have been swamped with some other stuff so did not get a chance to run
    further tests on my setup.
    Will send them out early next week so we can compare.

    Cheers,
    Usman
    On Fri, Oct 16, 2009 at 4:01 AM, tim robertson
    wrote:

    Hi all,

    Adding the following to core-site.xml, mapred-site.xml and
    hdfs-site.xml (based on Cloudera guidelines:
    http://tinyurl.com/ykupczu)
    io.sort.factor: 15  (mapred-site.xml)
    io.sort.mb: 150  (mapred-site.xml)
    io.file.buffer.size: 65536   (core-site.xml)
    dfs.datanode.handler.count: 3 (hdfs-site.xml  actually this is the
    default)

    and using the default of HADOOP_HEAPSIZE=1000 (hadoop-env.sh)

    Using 2 mappers and 2 reducers, can someone please help me with the
    maths as to why my jobs are failing with "Error: Java heap space" in
    the maps?
    (the same runs fine with io.sort.factor of 10 and io.sort.mb of 100)

    io.sort.mb of 200 x 4 (2 mappers, 2 reducers) = 0.8G
    Plus the 2 daemons on the node at 1G each = 1.8G
    Plus Xmx of 1G for each hadoop daemon task = 5.8G

    The machines have 8G in them.  Obviously my maths is screwy somewhere...

    Hi Tim,

    Did you also change mapred.child.java.opts? The HADOOP_HEAPSIZE parameter
    is
    for the daemons, not the tasks. If you bump up io.sort.mb you also have to
    bump up the -Xmx argument in mapred.child.java.opts to give the actual
    tasks
    more RAM.

    -Todd


    On Fri, Oct 16, 2009 at 9:59 AM, Erik Forsberg <forsberg@opera.com>
    wrote:
    On Thu, 15 Oct 2009 11:32:35 +0200
    Usman Waheed wrote:

    Hi Todd,

    Some changes have been applied to the cluster based on the
    documentation (URL) you noted below,
    I would also like to know what settings people are tuning on the
    operating system level. The blog post mentioned here does not mention
    much about that, except for the fileno changes.

    We got about 3x the read performance when running DFSIOTest by mounting
    our ext3 filesystems with the noatime parameter. I saw that mentioned
    in the slides from some Cloudera presentation.

    (For those who don't know, the noatime parameter turns off the
    recording of access time on files. That's a horrible performance killer
    since it means every read of a file also means that the kernel must do
    a write. These writes are probably queued up, but still, if you don't
    need the atime (very few applications do), turn it off!)

    Have people been experimenting with different filesystems, or are most
    of us running on top of ext3?

    How about mounting ext3 with "data=writeback"? That's rumoured to give
    the best throughput and could help with write performance. From
    mount(8):

    writeback
    Data ordering is not preserved - data may be written into the
    main file system
    after its metadata has been  committed  to the journal.  This
    is rumoured to be the
    highest throughput option.  It guarantees internal file system
    integrity,
    however it can allow old data to appear in files after a crash
    and journal recovery.
    How would the HDFS consistency checks cope with old data appearing in
    the unerlying files after a system crash?

    Cheers,
    \EF
    --
    Erik Forsberg <forsberg@opera.com>
    Developer, Opera Software - http://www.opera.com/
  • Usman Waheed at Oct 19, 2009 at 2:14 pm
    Tim,

    I have 4 nodes (Quad Core 2.00Ghz, 8GB RAM, 4x1TB disks), where one is
    the master+datanode and the rest are datanodes.

    Job: Sort 40GB of random data

    With the following current configuration setting:

    io.sort.factor: 10
    io.sort.mb: 100
    io.file.buffer.size: 65536
    mapred.child.java.opts: -Xmx200M
    dfs.datanode.handler.count=3
    2 Mappers
    2 Reducer
    Time taken: 28 minutes

    Still testing with more config changes, will send the results out.

    -Usman


    Hi all,

    I thought I would post the findings of my tuning tests running the
    sort benchmark.

    This is all based on 10 machines (1 as masters and 9 DN/TT) each of:
    Dell R300: 2.83G Quadcore (2x6MB cache 1 proc), 8G RAM and 2x500G SATA drives

    --- Vanilla installation ---
    2M 2R: 36 mins
    4M 4R: 36 mins (yes the same)


    --- Tuned according to Cloudera http://tinyurl.com/ykupczu ---
    io.sort.factor: 20 (mapred-site.xml)
    io.sort.mb: 200 (mapred-site.xml)
    io.file.buffer.size: 65536 (core-site.xml)
    mapred.child.java.opts: -Xmx512M (mapred-site.xml)

    2M 2R: 33.5 mins
    4M 4R: 29 mins
    8M 8R: 41 mins


    --- Increasing the task memory a little ---
    io.sort.factor: 20
    io.sort.mb: 200
    io.file.buffer.size: 65536
    mapred.child.java.opts: -Xmx1G

    2M 2R: 29 mins (adding dfs.datanode.handler.count=8 resulted in 30 mins)
    4M 4R: 29 mins (yes the same)


    --- Increasing sort memory ---
    io.sort.factor: 32
    io.sort.mb: 320
    io.file.buffer.size: 65536
    mapred.child.java.opts: -Xmx1G

    2M 2R: 31 mins (yes longer than lower sort sizes)

    I am going to stick with the following for now and get back to work...
    io.sort.factor: 20
    io.sort.mb: 200
    io.file.buffer.size: 65536
    mapred.child.java.opts: -Xmx1G
    dfs.datanode.handler.count=8
    4 Mappers
    4 Reducer

    Hope that helps someone. How did your tuning go Usman?

    Tim


    On Fri, Oct 16, 2009 at 10:41 PM, tim robertson
    wrote:
    No worries Usman, I will try and do the same on Monday.

    Thanks Todd for the clarification.

    Tim

    On Fri, Oct 16, 2009 at 5:30 PM, Usman Waheed wrote:

    Hi Tim,

    I have been swamped with some other stuff so did not get a chance to run
    further tests on my setup.
    Will send them out early next week so we can compare.

    Cheers,
    Usman

    On Fri, Oct 16, 2009 at 4:01 AM, tim robertson
    wrote:


    Hi all,

    Adding the following to core-site.xml, mapred-site.xml and
    hdfs-site.xml (based on Cloudera guidelines:
    http://tinyurl.com/ykupczu)
    io.sort.factor: 15 (mapred-site.xml)
    io.sort.mb: 150 (mapred-site.xml)
    io.file.buffer.size: 65536 (core-site.xml)
    dfs.datanode.handler.count: 3 (hdfs-site.xml actually this is the
    default)

    and using the default of HADOOP_HEAPSIZE=1000 (hadoop-env.sh)

    Using 2 mappers and 2 reducers, can someone please help me with the
    maths as to why my jobs are failing with "Error: Java heap space" in
    the maps?
    (the same runs fine with io.sort.factor of 10 and io.sort.mb of 100)

    io.sort.mb of 200 x 4 (2 mappers, 2 reducers) = 0.8G
    Plus the 2 daemons on the node at 1G each = 1.8G
    Plus Xmx of 1G for each hadoop daemon task = 5.8G

    The machines have 8G in them. Obviously my maths is screwy somewhere...


    Hi Tim,

    Did you also change mapred.child.java.opts? The HADOOP_HEAPSIZE parameter
    is
    for the daemons, not the tasks. If you bump up io.sort.mb you also have to
    bump up the -Xmx argument in mapred.child.java.opts to give the actual
    tasks
    more RAM.

    -Todd



    On Fri, Oct 16, 2009 at 9:59 AM, Erik Forsberg <forsberg@opera.com>
    wrote:

    On Thu, 15 Oct 2009 11:32:35 +0200
    Usman Waheed wrote:


    Hi Todd,

    Some changes have been applied to the cluster based on the
    documentation (URL) you noted below,
    I would also like to know what settings people are tuning on the
    operating system level. The blog post mentioned here does not mention
    much about that, except for the fileno changes.

    We got about 3x the read performance when running DFSIOTest by mounting
    our ext3 filesystems with the noatime parameter. I saw that mentioned
    in the slides from some Cloudera presentation.

    (For those who don't know, the noatime parameter turns off the
    recording of access time on files. That's a horrible performance killer
    since it means every read of a file also means that the kernel must do
    a write. These writes are probably queued up, but still, if you don't
    need the atime (very few applications do), turn it off!)

    Have people been experimenting with different filesystems, or are most
    of us running on top of ext3?

    How about mounting ext3 with "data=writeback"? That's rumoured to give
    the best throughput and could help with write performance. From
    mount(8):

    writeback
    Data ordering is not preserved - data may be written into the
    main file system

    after its metadata has been committed to the journal. This
    is rumoured to be the

    highest throughput option. It guarantees internal file system
    integrity,

    however it can allow old data to appear in files after a crash
    and journal recovery.

    How would the HDFS consistency checks cope with old data appearing in
    the unerlying files after a system crash?

    Cheers,
    \EF
    --
    Erik Forsberg <forsberg@opera.com>
    Developer, Opera Software - http://www.opera.com/

  • Tim robertson at Oct 19, 2009 at 6:29 pm
    Just while I am on this thread of performance I thought I would continue...

    I am moving a 200 million record table (mostly varchar) from a myisam
    mysql database to hadoop. First map reduce tests seem pretty
    positive. A full table scan in mysql takes 17 minutes on a $20,000
    pretty heavyweight 64G memory server (sorry don't know exact CPU /
    Disk specs offhand), with a count that can use an indexed INTEGER
    (index pegged in memory) taking 3 minutes. Exporting the table and
    importing to HDFS as a tab delimited file (100GB), on the above
    cluster, the same scans take 4 minutes for a custom MR job.

    Actually I have 2x 200 million record tables, and a full scan
    requiring a join of both those tables in mysql is around 30 minutes,
    but I joined them in the export before the import into hadoop. So
    some of my ad hoc reports have just come down from 30 mins to 4 mins
    (and I will use Hive for this).

    Tim

    Mon, Oct 19, 2009 at 4:14 PM, Usman Waheed wrote:
    Tim,

    I have 4 nodes (Quad Core 2.00Ghz, 8GB RAM, 4x1TB disks), where one is the
    master+datanode and the rest are datanodes.

    Job: Sort 40GB of random data

    With the following current configuration setting:

    io.sort.factor: 10
    io.sort.mb: 100
    io.file.buffer.size: 65536
    mapred.child.java.opts: -Xmx200M
    dfs.datanode.handler.count=3
    2 Mappers
    2 Reducer
    Time taken: 28 minutes

    Still testing with more config changes, will send the results out.

    -Usman


    Hi all,

    I thought I would post the findings of my tuning tests running the
    sort benchmark.

    This is all based on 10 machines (1 as masters and 9 DN/TT) each of:
    Dell R300: 2.83G Quadcore (2x6MB cache 1 proc), 8G RAM and 2x500G SATA
    drives

    --- Vanilla installation ---
    2M 2R: 36 mins
    4M 4R: 36 mins (yes the same)


    --- Tuned according to Cloudera http://tinyurl.com/ykupczu ---
    io.sort.factor: 20  (mapred-site.xml)
    io.sort.mb: 200  (mapred-site.xml)
    io.file.buffer.size: 65536   (core-site.xml)
    mapred.child.java.opts: -Xmx512M  (mapred-site.xml)

    2M 2R: 33.5 mins
    4M 4R: 29 mins
    8M 8R: 41 mins


    --- Increasing the task memory a little ---
    io.sort.factor: 20
    io.sort.mb: 200
    io.file.buffer.size: 65536
    mapred.child.java.opts: -Xmx1G

    2M 2R: 29 mins  (adding dfs.datanode.handler.count=8 resulted in 30 mins)
    4M 4R: 29 mins (yes the same)


    --- Increasing sort memory ---
    io.sort.factor: 32
    io.sort.mb: 320
    io.file.buffer.size: 65536
    mapred.child.java.opts: -Xmx1G

    2M 2R: 31 mins (yes longer than lower sort sizes)

    I am going to stick with the following for now and get back to work...
    io.sort.factor: 20
    io.sort.mb: 200
    io.file.buffer.size: 65536
    mapred.child.java.opts: -Xmx1G
    dfs.datanode.handler.count=8
    4 Mappers
    4 Reducer

    Hope that helps someone.  How did your tuning go Usman?

    Tim


    On Fri, Oct 16, 2009 at 10:41 PM, tim robertson
    wrote:
    No worries Usman,  I will try and do the same on Monday.

    Thanks Todd for the clarification.

    Tim

    On Fri, Oct 16, 2009 at 5:30 PM, Usman Waheed wrote:


    Hi Tim,

    I have been swamped with some other stuff so did not get a chance to run
    further tests on my setup.
    Will send them out early next week so we can compare.

    Cheers,
    Usman

    On Fri, Oct 16, 2009 at 4:01 AM, tim robertson
    wrote:


    Hi all,

    Adding the following to core-site.xml, mapred-site.xml and
    hdfs-site.xml (based on Cloudera guidelines:
    http://tinyurl.com/ykupczu)
    io.sort.factor: 15  (mapred-site.xml)
    io.sort.mb: 150  (mapred-site.xml)
    io.file.buffer.size: 65536   (core-site.xml)
    dfs.datanode.handler.count: 3 (hdfs-site.xml  actually this is the
    default)

    and using the default of HADOOP_HEAPSIZE=1000 (hadoop-env.sh)

    Using 2 mappers and 2 reducers, can someone please help me with the
    maths as to why my jobs are failing with "Error: Java heap space" in
    the maps?
    (the same runs fine with io.sort.factor of 10 and io.sort.mb of 100)

    io.sort.mb of 200 x 4 (2 mappers, 2 reducers) = 0.8G
    Plus the 2 daemons on the node at 1G each = 1.8G
    Plus Xmx of 1G for each hadoop daemon task = 5.8G

    The machines have 8G in them.  Obviously my maths is screwy
    somewhere...


    Hi Tim,

    Did you also change mapred.child.java.opts? The HADOOP_HEAPSIZE
    parameter
    is
    for the daemons, not the tasks. If you bump up io.sort.mb you also have
    to
    bump up the -Xmx argument in mapred.child.java.opts to give the actual
    tasks
    more RAM.

    -Todd



    On Fri, Oct 16, 2009 at 9:59 AM, Erik Forsberg <forsberg@opera.com>
    wrote:

    On Thu, 15 Oct 2009 11:32:35 +0200
    Usman Waheed wrote:


    Hi Todd,

    Some changes have been applied to the cluster based on the
    documentation (URL) you noted below,
    I would also like to know what settings people are tuning on the
    operating system level. The blog post mentioned here does not mention
    much about that, except for the fileno changes.

    We got about 3x the read performance when running DFSIOTest by
    mounting
    our ext3 filesystems with the noatime parameter. I saw that mentioned
    in the slides from some Cloudera presentation.

    (For those who don't know, the noatime parameter turns off the
    recording of access time on files. That's a horrible performance
    killer
    since it means every read of a file also means that the kernel must
    do
    a write. These writes are probably queued up, but still, if you don't
    need the atime (very few applications do), turn it off!)

    Have people been experimenting with different filesystems, or are
    most
    of us running on top of ext3?

    How about mounting ext3 with "data=writeback"? That's rumoured to
    give
    the best throughput and could help with write performance. From
    mount(8):

    writeback
    Data ordering is not preserved - data may be written into
    the
    main file system

    after its metadata has been  committed  to the journal.
    This
    is rumoured to be the

    highest throughput option.  It guarantees internal file
    system
    integrity,

    however it can allow old data to appear in files after a
    crash
    and journal recovery.

    How would the HDFS consistency checks cope with old data appearing in
    the unerlying files after a system crash?

    Cheers,
    \EF
    --
    Erik Forsberg <forsberg@opera.com>
    Developer, Opera Software - http://www.opera.com/

  • Bryan Talbot at Oct 20, 2009 at 12:14 am
    Here's another data point from a small cluster running Cloudera 20.1:

    4 slaves of 2 Quad core (E5405) 2.00 GHz, 8 GB RAM, 4 1TB SATA drives
    1 master running nn, 2nn and jt


    dfs.replication=2
    io.sort.factor: 25
    io.sort.mb: 250
    io.file.buffer.size: 65536
    mapred.child.java.opts: -Xmx400M
    mapred.tasktracker.map.tasks.maximum=7
    mapred.tasktracker.reduce.tasks.maximum=7
    mapred.job.reuse.jvm.num.tasks=10

    $>hadoop jar /usr/lib/hadoop/hadoop-0.20.1+133-examples.jar
    randomwriter -D dfs.block.size=134217728 input

    Takes about 4 mins


    $>hadoop jar /usr/lib/hadoop/hadoop-0.20.1+133-examples.jar sort input
    output

    Takes about 11 mins (map takes about 4.5 mins)



    With the default configurations, the map tasks run for just a couple
    seconds with the average number of tasks running at any one time being
    just 20% of the map task capacity. Increasing the block size and
    reusing jvm tasks had the most noticeable impact on performance.


    -Bryan



    On Oct 19, 2009, at Oct 19, 7:14 AM, Usman Waheed wrote:

    io.sort.factor: 10
    io.sort.mb: 100
    io.file.buffer.size: 65536
    mapred.child.java.opts: -Xmx200M
    dfs.datanode.handler.count=3
    2 Mappers
    2 Reducer
  • Sudha sadhasivam at Oct 14, 2009 at 11:27 am
    Can use terrabyte sort to check up performance
    G Sudha Sadasivam

    --- On Wed, 10/14/09, tim robertson wrote:


    From: tim robertson <timrobertson100@gmail.com>
    Subject: Re: Hardware performance from HADOOP cluster
    To: common-user@hadoop.apache.org
    Date: Wednesday, October 14, 2009, 3:16 PM


    I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB
    cache), 8G RAM and 2x500G drives, and will do the same soon.  Got some
    issues though so it won't start up...

    Tim

    On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed wrote:
    Thanks Tim, i will check it out and post my results for comments.
    -Usman
    Might it be worth running the http://wiki.apache.org/hadoop/Sort and
    posting your results for comment?

    Tim

    On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed wrote:


    Hi,

    Is there a way to tell what kind of performance numbers one can expect
    out
    of their cluster given a certain set of specs.

    For example i have 5 nodes in my cluster that all have the following
    hardware configuration(s):
    Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack.

    Thanks,
    Usman

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 14, '09 at 8:53a
activeOct 20, '09 at 12:14a
posts22
users8
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase