FAQ
Run Hadoop sort benchmark on Amazon EC2
---------------------------------------

Key: HADOOP-4382
URL: https://issues.apache.org/jira/browse/HADOOP-4382
Project: Hadoop Core
Issue Type: Test
Components: contrib/ec2
Reporter: Tom White
Assignee: Tom White


By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Tom White (JIRA) at Nov 26, 2008 at 5:41 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-4382:
    ------------------------------

    Attachment: hadoop-4382.patch

    A script that:

    1. Launches a cluster on EC2
    2. Waits for the cluster and Hadoop daemons to start
    3. Runs a small sort job to warm up the cluster
    4. Runs a sort job and emits the job duration
    5. Terminates the cluster

    Running on an 8 node cluster it took 2742 seconds to sort 32GB of data using the default hadoop-site.xml that the EC2 scripts use. This could be improved by using better settings.

    There are several improvements that could be made to the script, in particular in detecting when the cluster is ready to go (the current script waits until 90% of the nodes are up then waits 1 minute for Hadoop to start). There are more ideas here: http://www.nabble.com/Auto-shutdown-for-EC2-clusters-td20132561.html It would also be good to do multiple runs, discard the first and compute an average.

    This should be a good basis for running a regular EC2 benchmark from Hudson.

    Comments welcome.
    Run Hadoop sort benchmark on Amazon EC2
    ---------------------------------------

    Key: HADOOP-4382
    URL: https://issues.apache.org/jira/browse/HADOOP-4382
    Project: Hadoop Core
    Issue Type: Test
    Components: contrib/ec2
    Reporter: Tom White
    Assignee: Tom White
    Attachments: hadoop-4382.patch


    By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Nov 26, 2008 at 5:47 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651075#action_12651075 ]

    Tom White commented on HADOOP-4382:
    -----------------------------------

    I should say that the 8 node cluster used large EC2 instances (and the namenode/jobtracker is not included in the 8 nodes).
    Run Hadoop sort benchmark on Amazon EC2
    ---------------------------------------

    Key: HADOOP-4382
    URL: https://issues.apache.org/jira/browse/HADOOP-4382
    Project: Hadoop Core
    Issue Type: Test
    Components: contrib/ec2
    Reporter: Tom White
    Assignee: Tom White
    Attachments: hadoop-4382.patch


    By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Nigel Daley (JIRA) at Nov 26, 2008 at 10:39 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651168#action_12651168 ]

    Nigel Daley commented on HADOOP-4382:
    -------------------------------------

    Looks good Tom. A couple comments:

    - should we also run sortvalidation to ensure the sort actually worked?
    - what bin dir are you putting the script in?
    - perhaps name the script sort-benchmark
    - add a line to echo the # minutes into a file as follows for Hudson plot:
    {quote}
    sort_minutes=`expr ${sort_duration} / 60`
    echo "YVALUE=${sort_minutes}" > sort_minutes.properties
    {quote}
    Run Hadoop sort benchmark on Amazon EC2
    ---------------------------------------

    Key: HADOOP-4382
    URL: https://issues.apache.org/jira/browse/HADOOP-4382
    Project: Hadoop Core
    Issue Type: Test
    Components: contrib/ec2
    Reporter: Tom White
    Assignee: Tom White
    Attachments: hadoop-4382.patch


    By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Nigel Daley (JIRA) at Nov 26, 2008 at 10:43 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651169#action_12651169 ]

    Nigel Daley commented on HADOOP-4382:
    -------------------------------------

    Argh, Jira wiki notation ate my code snippet.

    {noformat}
    sort_minutes=`expr ${sort_duration} / 60`
    echo "YVALUE=${sort_minutes}" > sort_minutes.properties
    {noformat}


    Run Hadoop sort benchmark on Amazon EC2
    ---------------------------------------

    Key: HADOOP-4382
    URL: https://issues.apache.org/jira/browse/HADOOP-4382
    Project: Hadoop Core
    Issue Type: Test
    Components: contrib/ec2
    Reporter: Tom White
    Assignee: Tom White
    Attachments: hadoop-4382.patch


    By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Nov 27, 2008 at 1:35 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-4382:
    ------------------------------

    Attachment: hadoop-4382-v2.patch

    Thanks for the comments Nigel.

    New patch incorporating the suggestions. (I've created the patch from the base of Hadoop this time, so the script goes in src/contrib/ec2/bin.)
    Run Hadoop sort benchmark on Amazon EC2
    ---------------------------------------

    Key: HADOOP-4382
    URL: https://issues.apache.org/jira/browse/HADOOP-4382
    Project: Hadoop Core
    Issue Type: Test
    Components: contrib/ec2
    Reporter: Tom White
    Assignee: Tom White
    Attachments: hadoop-4382-v2.patch, hadoop-4382.patch


    By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Nigel Daley (JIRA) at Dec 1, 2008 at 5:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Nigel Daley updated HADOOP-4382:
    --------------------------------

    Hadoop Flags: [Reviewed]

    +1
    Run Hadoop sort benchmark on Amazon EC2
    ---------------------------------------

    Key: HADOOP-4382
    URL: https://issues.apache.org/jira/browse/HADOOP-4382
    Project: Hadoop Core
    Issue Type: Test
    Components: contrib/ec2
    Reporter: Tom White
    Assignee: Tom White
    Attachments: hadoop-4382-v2.patch, hadoop-4382.patch


    By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedOct 9, '08 at 8:13a
activeDec 1, '08 at 5:14p
posts7
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Nigel Daley (JIRA): 7 posts

People

Translate

site design / logo © 2022 Grokbase