FAQ
Hey all,

While running the (latest as of Friday) Cloudera-created EC2 scripts,
I noticed that running the terminate-cluster script kills ALL of your
EC2 nodes, not just those associated with the cluster.  This has been
documented before in HADOOP-1504
(http://issues.apache.org/jira/browse/HADOOP-1504), and a fix was
integrated way back on June 21, 2007. My questions are:

1) Is anyone else seeing this? I can reproduce this behavior consistently.
AND
2) Is this a regression in the common code, a problem with the
Cloudera scripts, or just user error on my part?

Just trying to get to the bottom of this so no one else has to see all
of their EC2 instances die accidentally :(

Thanks!

-Mark

Search Discussions

  • Tom White at Oct 19, 2009 at 3:52 pm
    Hi Mark,

    Sorry to hear that all your EC2 instances were terminated. Needless to
    say, this should certainly not happen.

    The scripts are a Python rewrite (see HADOOP-6108) of the bash ones so
    HADOOP-1504 is not applicable, but the behaviour should be the same:
    the terminate-cluster command lists the instances that it will
    terminate, and prompts for confirmation that they should be
    terminated. Is it listing instances that are not in the cluster? I
    have used this script a lot and it has never terminated any instances
    that are not in the cluster.

    What are the names of the security groups that the instances are in
    (both those in the cluster, and those outside the cluster that are
    inadvertently terminated)?

    Thanks,
    Tom
    On Mon, Oct 19, 2009 at 4:41 PM, Mark Stetzer wrote:
    Hey all,

    While running the (latest as of Friday) Cloudera-created EC2 scripts,
    I noticed that running the terminate-cluster script kills ALL of your
    EC2 nodes, not just those associated with the cluster.  This has been
    documented before in HADOOP-1504
    (http://issues.apache.org/jira/browse/HADOOP-1504), and a fix was
    integrated way back on June 21, 2007.  My questions are:

    1)  Is anyone else seeing this?  I can reproduce this behavior consistently.
    AND
    2)  Is this a regression in the common code, a problem with the
    Cloudera scripts, or just user error on my part?

    Just trying to get to the bottom of this so no one else has to see all
    of their EC2 instances die accidentally :(

    Thanks!

    -Mark
  • Mark Stetzer at Oct 19, 2009 at 4:34 pm
    Hi Tom,

    The terminate-cluster script only lists the instances that are part of
    the cluster (master and all slaves) as far as I can tell. As an
    example, I set up a cluster of 1 master and 5 slaves, then started an
    additional non-Hadoop server via the AWS mgmt. console running a
    completely different AMI (OpenSolaris 2009.06 just to be very
    different). terminate-cluster only listed the 6 instances that were
    part of the cluster if I remember correctly.

    I have 4 security groups: default, default-master, default-slave, and
    mark-default. mark-default wasn't even added until after I started
    the Hadoop cluster; I added it to log in to the OpenSolaris instance.

    Does this help at all? Thanks.

    -Mark
    On Mon, Oct 19, 2009 at 11:52 AM, Tom White wrote:
    Hi Mark,

    Sorry to hear that all your EC2 instances were terminated. Needless to
    say, this should certainly not happen.

    The scripts are a Python rewrite (see HADOOP-6108) of the bash ones so
    HADOOP-1504 is not applicable, but the behaviour should be the same:
    the terminate-cluster command lists the instances that it will
    terminate, and prompts for confirmation that they should be
    terminated. Is it listing instances that are not in the cluster? I
    have used this script a lot and it has never terminated any instances
    that are not in the cluster.

    What are the names of the security groups that the instances are in
    (both those in the cluster, and those outside the cluster that are
    inadvertently terminated)?

    Thanks,
    Tom
    On Mon, Oct 19, 2009 at 4:41 PM, Mark Stetzer wrote:
    Hey all,

    While running the (latest as of Friday) Cloudera-created EC2 scripts,
    I noticed that running the terminate-cluster script kills ALL of your
    EC2 nodes, not just those associated with the cluster.  This has been
    documented before in HADOOP-1504
    (http://issues.apache.org/jira/browse/HADOOP-1504), and a fix was
    integrated way back on June 21, 2007.  My questions are:

    1)  Is anyone else seeing this?  I can reproduce this behavior consistently.
    AND
    2)  Is this a regression in the common code, a problem with the
    Cloudera scripts, or just user error on my part?

    Just trying to get to the bottom of this so no one else has to see all
    of their EC2 instances die accidentally :(

    Thanks!

    -Mark
  • Tom White at Oct 19, 2009 at 4:53 pm

    On Mon, Oct 19, 2009 at 5:34 PM, Mark Stetzer wrote:
    Hi Tom,

    The terminate-cluster script only lists the instances that are part of
    the cluster (master and all slaves) as far as I can tell.  As an
    example, I set up a cluster of 1 master and 5 slaves, then started an
    additional non-Hadoop server via the AWS mgmt. console running a
    completely different AMI (OpenSolaris 2009.06 just to be very
    different).  terminate-cluster only listed the 6 instances that were
    part of the cluster if I remember correctly.

    I have 4 security groups:  default, default-master, default-slave, and
    mark-default.  mark-default wasn't even added until after I started
    the Hadoop cluster; I added it to log in to the OpenSolaris instance.
    I think there is a bug here. I've filed
    https://issues.apache.org/jira/browse/HADOOP-6320. As an immediate
    workaround you can avoid calling the Hadoop cluster "default", and
    make sure that you don't create non-Hadoop EC2 instances in the
    cluster group.

    Thanks,
    Tom
    Does this help at all?  Thanks.

    -Mark
    On Mon, Oct 19, 2009 at 11:52 AM, Tom White wrote:
    Hi Mark,

    Sorry to hear that all your EC2 instances were terminated. Needless to
    say, this should certainly not happen.

    The scripts are a Python rewrite (see HADOOP-6108) of the bash ones so
    HADOOP-1504 is not applicable, but the behaviour should be the same:
    the terminate-cluster command lists the instances that it will
    terminate, and prompts for confirmation that they should be
    terminated. Is it listing instances that are not in the cluster? I
    have used this script a lot and it has never terminated any instances
    that are not in the cluster.

    What are the names of the security groups that the instances are in
    (both those in the cluster, and those outside the cluster that are
    inadvertently terminated)?

    Thanks,
    Tom
    On Mon, Oct 19, 2009 at 4:41 PM, Mark Stetzer wrote:
    Hey all,

    While running the (latest as of Friday) Cloudera-created EC2 scripts,
    I noticed that running the terminate-cluster script kills ALL of your
    EC2 nodes, not just those associated with the cluster.  This has been
    documented before in HADOOP-1504
    (http://issues.apache.org/jira/browse/HADOOP-1504), and a fix was
    integrated way back on June 21, 2007.  My questions are:

    1)  Is anyone else seeing this?  I can reproduce this behavior consistently.
    AND
    2)  Is this a regression in the common code, a problem with the
    Cloudera scripts, or just user error on my part?

    Just trying to get to the bottom of this so no one else has to see all
    of their EC2 instances die accidentally :(

    Thanks!

    -Mark

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 19, '09 at 3:41p
activeOct 19, '09 at 4:53p
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Mark Stetzer: 2 posts Tom White: 2 posts

People

Translate

site design / logo © 2022 Grokbase