FAQ
Create scripts to run Hadoop on Amazon EC2
------------------------------------------

Key: HADOOP-884
URL: https://issues.apache.org/jira/browse/HADOOP-884
Project: Hadoop
Issue Type: New Feature
Components: fs
Affects Versions: 0.10.1
Reporter: Tom White
Assigned To: Tom White


It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

  • Tom White (JIRA) at Jan 11, 2007 at 9:32 am
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Work on HADOOP-884 started by Tom White.
    Create scripts to run Hadoop on Amazon EC2
    ------------------------------------------

    Key: HADOOP-884
    URL: https://issues.apache.org/jira/browse/HADOOP-884
    Project: Hadoop
    Issue Type: New Feature
    Components: fs
    Affects Versions: 0.10.1
    Reporter: Tom White
    Assigned To: Tom White

    It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Tom White (JIRA) at Jan 17, 2007 at 9:49 am
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-884:
    -----------------------------

    Component/s: (was: fs)
    scripts
    Create scripts to run Hadoop on Amazon EC2
    ------------------------------------------

    Key: HADOOP-884
    URL: https://issues.apache.org/jira/browse/HADOOP-884
    Project: Hadoop
    Issue Type: New Feature
    Components: scripts
    Affects Versions: 0.10.1
    Reporter: Tom White
    Assigned To: Tom White

    It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Tom White (JIRA) at Jan 18, 2007 at 9:30 am
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-884:
    -----------------------------

    Attachment: hadoop-ec2-v1.tar.gz
    Create scripts to run Hadoop on Amazon EC2
    ------------------------------------------

    Key: HADOOP-884
    URL: https://issues.apache.org/jira/browse/HADOOP-884
    Project: Hadoop
    Issue Type: New Feature
    Components: scripts
    Affects Versions: 0.10.1
    Reporter: Tom White
    Assigned To: Tom White
    Attachments: hadoop-ec2-v1.tar.gz


    It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Tom White (JIRA) at Jan 18, 2007 at 9:44 am
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465698 ]

    Tom White commented on HADOOP-884:
    ----------------------------------

    I've attached a collection of scripts for this feature. It is still rough round the edges, and not ready for inclusion yet (indeed they should probalby be separate from the hadoop distribution), but the scripts work for me on Mac OS X and ubuntu. I've added instructions to the wiki at http://wiki.apache.org/lucene-hadoop/AmazonEC2.

    There are lots of improvements that could be made.

    * Create a Hadoop AMI that runs a parameterized launch to set cluster size and master hostname. See http://docs.amazonwebservices.com/AmazonEC2/dg/2006-10-01/AESDG-chapter-instancedata.html. Such an instance would modify the Hadoop config files on startup to reflect cluster size and master hostname.
    * Setting up DNS is a pain. We could either automate the DNS configuration using DynDNS's webservice (https://www.dyndns.com/developers/specs/syntax.html), or do away with having to set up DNS altogether.
    * Create a public Hadoop AMI (for each Hadoop version) so people don't need to build their own. See http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured.
    * Adapt `run-hadoop-cluster` to take the jar containing the MapReduce job as a parameter.

    Create scripts to run Hadoop on Amazon EC2
    ------------------------------------------

    Key: HADOOP-884
    URL: https://issues.apache.org/jira/browse/HADOOP-884
    Project: Hadoop
    Issue Type: New Feature
    Components: scripts
    Affects Versions: 0.10.1
    Reporter: Tom White
    Assigned To: Tom White
    Attachments: hadoop-ec2-v1.tar.gz


    It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • James P. White (JIRA) at Jan 18, 2007 at 7:51 pm
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465849 ]

    James P. White commented on HADOOP-884:
    ---------------------------------------

    I'm quite sure the solution to the DNS problem is Zeroconf.

    http://www.ifcx.org/wiki/LocalNetworking.html

    http://zeroconf.org/

    Amazon is already using it for the parameterized launch. That where the funny "169.254.169.254" address comes from.

    http://docs.amazonwebservices.com/AmazonEC2/dg/2006-10-01/TechnicalFAQ.html#d0e14061

    There are several ways that this can be approached. The one that would help the most people would be to make Hadoop Zeroconf-aware (slaves using service discovery to find the master), but probably the place to start is to just enhance these EC2 scripts.
    Create scripts to run Hadoop on Amazon EC2
    ------------------------------------------

    Key: HADOOP-884
    URL: https://issues.apache.org/jira/browse/HADOOP-884
    Project: Hadoop
    Issue Type: New Feature
    Components: scripts
    Affects Versions: 0.10.1
    Reporter: Tom White
    Assigned To: Tom White
    Attachments: hadoop-ec2-v1.tar.gz


    It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Jan 18, 2007 at 8:05 pm
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465855 ]

    Doug Cutting commented on HADOOP-884:
    -------------------------------------

    I don't think these should go in the normal bin/ directory, but I think including them in the distribution tarfile might be good. They could perhaps go in contrib/ec2/bin/?
    Create scripts to run Hadoop on Amazon EC2
    ------------------------------------------

    Key: HADOOP-884
    URL: https://issues.apache.org/jira/browse/HADOOP-884
    Project: Hadoop
    Issue Type: New Feature
    Components: scripts
    Affects Versions: 0.10.1
    Reporter: Tom White
    Assigned To: Tom White
    Attachments: hadoop-ec2-v1.tar.gz


    It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Tom White (JIRA) at Jan 18, 2007 at 9:58 pm
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465883 ]

    Tom White commented on HADOOP-884:
    ----------------------------------

    Yes, contrib/ec2/bin/ sounds like the right place.
    Create scripts to run Hadoop on Amazon EC2
    ------------------------------------------

    Key: HADOOP-884
    URL: https://issues.apache.org/jira/browse/HADOOP-884
    Project: Hadoop
    Issue Type: New Feature
    Components: scripts
    Affects Versions: 0.10.1
    Reporter: Tom White
    Assigned To: Tom White
    Attachments: hadoop-ec2-v1.tar.gz


    It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Jan 18, 2007 at 10:24 pm
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465890 ]

    Doug Cutting commented on HADOOP-884:
    -------------------------------------

    Please mark this as "Patch Available" when you feel these scripts are ready for inclusion. Hopefully they'll make the 0.11 release in two weeks.
    Create scripts to run Hadoop on Amazon EC2
    ------------------------------------------

    Key: HADOOP-884
    URL: https://issues.apache.org/jira/browse/HADOOP-884
    Project: Hadoop
    Issue Type: New Feature
    Components: scripts
    Affects Versions: 0.10.1
    Reporter: Tom White
    Assigned To: Tom White
    Attachments: hadoop-ec2-v1.tar.gz


    It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Lee Faris (JIRA) at Jan 23, 2007 at 8:42 am
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466672 ]

    Lee Faris commented on HADOOP-884:
    ----------------------------------

    I was thinking more along the lines of calling the EC2 web service directly via Java. The command line tools are thin wrappers around the web service.
    Create scripts to run Hadoop on Amazon EC2
    ------------------------------------------

    Key: HADOOP-884
    URL: https://issues.apache.org/jira/browse/HADOOP-884
    Project: Hadoop
    Issue Type: New Feature
    Components: scripts
    Affects Versions: 0.10.1
    Reporter: Tom White
    Assigned To: Tom White
    Attachments: hadoop-ec2-v1.tar.gz


    It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Jan 23, 2007 at 9:24 am
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466679 ]

    Tom White commented on HADOOP-884:
    ----------------------------------

    I agree that long term it would be more efficient to call the EC2 web service via Java, and these scripts could be the basis for this. At the moment, I'm focusing on getting the scripts working smoothly.
    Create scripts to run Hadoop on Amazon EC2
    ------------------------------------------

    Key: HADOOP-884
    URL: https://issues.apache.org/jira/browse/HADOOP-884
    Project: Hadoop
    Issue Type: New Feature
    Components: scripts
    Affects Versions: 0.10.1
    Reporter: Tom White
    Assigned To: Tom White
    Attachments: hadoop-ec2-v1.tar.gz


    It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Jan 29, 2007 at 9:24 pm
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-884:
    -----------------------------

    Attachment: hadoop-884.patch

    The attached patch includes the Hadoop EC2 scripts in contrib/ec2/bin. I think they are ready for inclusion in the main distribution now.

    I have extended the scripts since the version in the tar.gz file by making them more robust: they no longer have to be unpacked and invoked from the user's home directory. More significantly, I have used a parameterized launch to set cluster size and master hostname. Previously, you had to build an image for a particular cluster size and hostname - now you can build one image and choose the cluster size and host name at launch time. (This is a step towards shared Hadoop images.)

    As for the other improvements, I will create new Jira issues for them, since the basic scripts are in a working state (although I would love feedback if anyone tries them out).

    James - thank you for the suggestion about Zeroconf. I've not had any experience with it, so any help would be appreciated.
    Create scripts to run Hadoop on Amazon EC2
    ------------------------------------------

    Key: HADOOP-884
    URL: https://issues.apache.org/jira/browse/HADOOP-884
    Project: Hadoop
    Issue Type: New Feature
    Components: scripts
    Affects Versions: 0.10.1
    Reporter: Tom White
    Assigned To: Tom White
    Attachments: hadoop-884.patch, hadoop-ec2-v1.tar.gz


    It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Jan 29, 2007 at 9:24 pm
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tom White updated HADOOP-884:
    -----------------------------

    Status: Patch Available (was: In Progress)
    Create scripts to run Hadoop on Amazon EC2
    ------------------------------------------

    Key: HADOOP-884
    URL: https://issues.apache.org/jira/browse/HADOOP-884
    Project: Hadoop
    Issue Type: New Feature
    Components: scripts
    Affects Versions: 0.10.1
    Reporter: Tom White
    Assigned To: Tom White
    Attachments: hadoop-884.patch, hadoop-ec2-v1.tar.gz


    It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Jan 29, 2007 at 9:47 pm
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468423 ]

    Hadoop QA commented on HADOOP-884:
    ----------------------------------

    +1, because http://issues.apache.org/jira/secure/attachment/12349853/hadoop-884.patch applied and successfully tested against trunk revision r501182.
    Create scripts to run Hadoop on Amazon EC2
    ------------------------------------------

    Key: HADOOP-884
    URL: https://issues.apache.org/jira/browse/HADOOP-884
    Project: Hadoop
    Issue Type: New Feature
    Components: scripts
    Affects Versions: 0.10.1
    Reporter: Tom White
    Assigned To: Tom White
    Attachments: hadoop-884.patch, hadoop-ec2-v1.tar.gz


    It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Jan 30, 2007 at 12:25 am
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doug Cutting updated HADOOP-884:
    --------------------------------

    Resolution: Fixed
    Fix Version/s: 0.11.0
    Status: Resolved (was: Patch Available)

    I just committed this. Thanks, Tom!

    A couple of future improvements to ponder:
    - perhaps the env file shouldn't be in subversion, but rather a template should be that's copied into place. That way we don't risk checking in an editted version.
    - bit of documentation, perhaps just a README, should ideally be bundled with this.
    Create scripts to run Hadoop on Amazon EC2
    ------------------------------------------

    Key: HADOOP-884
    URL: https://issues.apache.org/jira/browse/HADOOP-884
    Project: Hadoop
    Issue Type: New Feature
    Components: scripts
    Affects Versions: 0.10.1
    Reporter: Tom White
    Assigned To: Tom White
    Fix For: 0.11.0

    Attachments: hadoop-884.patch, hadoop-ec2-v1.tar.gz


    It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJan 11, '07 at 9:30a
activeJan 30, '07 at 12:25a
posts15
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Doug Cutting (JIRA): 15 posts

People

Translate

site design / logo © 2022 Grokbase