FAQ
Hello,

I am trying to use the cloudera manager Java API to do an HA cluster
deployment (HDFS and Jobtracker). I was wondering what are the prescriptive
steps to do that? This is what I have tried without much success yet.

1. Setup HDFS cluster with two namenodes, multiple datanodes and odd number
of journalnodes.
2. Define one of the NN as active and other as Standby node.
3. Enable high availability.

The process somehow is flaky and randomly fails with errors like "Failed to
format Namenode". Is there a prescribed recipe to do this using the API? An
example would be really awesome.

Thanks,


To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.

Search Discussions

  • Darren Lo at Jan 16, 2014 at 7:31 pm
    Hi Swarnim,

    You will generally need to configure all required fields and then replicate
    every step you see in the "First Run" page of the wizard. For enabling HA,
    there are API commands that should have fairly descriptive documentation.
    Let us know if you hit problems.

    You can see a mostly-functional example here:
    https://github.com/eBay/hadrian

    Be sure to read my comments here, which I don't think have all been
    addressed yet:
    https://github.com/eBay/hadrian/issues/1

    In our own internal automated testing we use the API to set up clusters,
    and haven't had flakiness errors with failing to format the Namenode. You
    may have problems if you are retrying a fresh install without cleaning up
    all the data dirs (this cleanup can't be done via CM, which generally tries
    not to blow up your data).

    Thanks,
    Darren


    On Thu, Jan 16, 2014 at 11:24 AM, Swarnim Kulkarni wrote:

    Hello,

    I am trying to use the cloudera manager Java API to do an HA cluster
    deployment (HDFS and Jobtracker). I was wondering what are the prescriptive
    steps to do that? This is what I have tried without much success yet.

    1. Setup HDFS cluster with two namenodes, multiple datanodes and odd
    number of journalnodes.
    2. Define one of the NN as active and other as Standby node.
    3. Enable high availability.

    The process somehow is flaky and randomly fails with errors like "Failed
    to format Namenode". Is there a prescribed recipe to do this using the API?
    An example would be really awesome.

    Thanks,


    To unsubscribe from this group and stop receiving emails from it, send an
    email to scm-users+unsubscribe@cloudera.org.


    --
    Thanks,
    Darren

    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.
  • Kulkarni Swarnim at Jan 16, 2014 at 11:45 pm
    Hi Darren,

    Thanks for the reply.

    I was able to get a perfectly healthy *non-HA* cluster up and running
    without any problems at all. I followed the second method as stated in the
    doc here[1] to setup the full cluster in one go.

    In order to get HA setup, I was using this command[2]. As the javadoc on
    the command states:

    "The command will set up the given "active" and "stand-by" NameNodes as an
    HA pair. *Both nodes need to already exist*."

    This gave me an impression that the initial cluster before running this
    command should have two namenodes. (I tried other permutations as well like
    setting up the second node as a secondary NN first and then converting it
    to standby node but with no avail).

    Post that, as a part of the arguments[3], I gave one namenode as primary
    and other as standby and setup as quorum based storage and executed the HA
    command[2].

    It is completely possible that my cluster is polluted in some way(though I
    tried deleting data directories and stuff but no luck). I just wanted to
    get my steps verified to make sure that there wasn't anything wrong with
    the algorithm that I was following.

    Thanks again for your help.

    [1]
    http://cloudera.github.io/cm_api/apidocs/v5/path__clusters_-clusterName-_services.html
    [2]
    http://cloudera.github.io/cm_api/javadoc/4.6.0/com/cloudera/api/v1/ServicesResource.html#hdfsEnableHaCommand(java.lang.String,
    com.cloudera.api.model.ApiHdfsHaArguments)
    [3]
    http://cloudera.github.io/cm_api/javadoc/4.6.0/com/cloudera/api/model/ApiHdfsHaArguments.html

    On Thu, Jan 16, 2014 at 1:31 PM, Darren Lo wrote:

    Hi Swarnim,

    You will generally need to configure all required fields and then
    replicate every step you see in the "First Run" page of the wizard. For
    enabling HA, there are API commands that should have fairly descriptive
    documentation. Let us know if you hit problems.

    You can see a mostly-functional example here:
    https://github.com/eBay/hadrian

    Be sure to read my comments here, which I don't think have all been
    addressed yet:
    https://github.com/eBay/hadrian/issues/1

    In our own internal automated testing we use the API to set up clusters,
    and haven't had flakiness errors with failing to format the Namenode. You
    may have problems if you are retrying a fresh install without cleaning up
    all the data dirs (this cleanup can't be done via CM, which generally tries
    not to blow up your data).

    Thanks,
    Darren



    On Thu, Jan 16, 2014 at 11:24 AM, Swarnim Kulkarni <
    kulkarni.swarnim@gmail.com> wrote:
    Hello,

    I am trying to use the cloudera manager Java API to do an HA cluster
    deployment (HDFS and Jobtracker). I was wondering what are the prescriptive
    steps to do that? This is what I have tried without much success yet.

    1. Setup HDFS cluster with two namenodes, multiple datanodes and odd
    number of journalnodes.
    2. Define one of the NN as active and other as Standby node.
    3. Enable high availability.

    The process somehow is flaky and randomly fails with errors like "Failed
    to format Namenode". Is there a prescribed recipe to do this using the API?
    An example would be really awesome.

    Thanks,


    To unsubscribe from this group and stop receiving emails from it, send
    an email to scm-users+unsubscribe@cloudera.org.


    --
    Thanks,
    Darren


    --
    Swarnim

    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.
  • Darren Lo at Jan 17, 2014 at 12:09 am
    Hi Swarnim,

    You can follow your steps to get a non-HA cluster first, then do the
    following:
    1) Create a Namenode role and 3 Quorum Journal roles. Do not start them.
    * You may need to configure your Quorum Journal data directory
    (dfs_journalnode_edits_dir), probably in the base role config group for
    journal nodes.
    * Assuming your existing steps used the default (aka "base") Namenode role
    config group for configuration, the new namenode's configuration should
    already be correct.
    * Secondary Namenode and your new Namenode should have different data dirs
    * data dirs for these roles should all be empty
    2) run enable ha command with correct args (sounds like you're doing this
    already)

    Hope this helps,
    Darren

    On Thu, Jan 16, 2014 at 3:45 PM, kulkarni.swarnim@gmail.com wrote:

    Hi Darren,

    Thanks for the reply.

    I was able to get a perfectly healthy *non-HA* cluster up and running
    without any problems at all. I followed the second method as stated in the
    doc here[1] to setup the full cluster in one go.

    In order to get HA setup, I was using this command[2]. As the javadoc on
    the command states:

    "The command will set up the given "active" and "stand-by" NameNodes as
    an HA pair. *Both nodes need to already exist*."

    This gave me an impression that the initial cluster before running this
    command should have two namenodes. (I tried other permutations as well like
    setting up the second node as a secondary NN first and then converting it
    to standby node but with no avail).

    Post that, as a part of the arguments[3], I gave one namenode as primary
    and other as standby and setup as quorum based storage and executed the HA
    command[2].

    It is completely possible that my cluster is polluted in some way(though I
    tried deleting data directories and stuff but no luck). I just wanted to
    get my steps verified to make sure that there wasn't anything wrong with
    the algorithm that I was following.

    Thanks again for your help.

    [1]
    http://cloudera.github.io/cm_api/apidocs/v5/path__clusters_-clusterName-_services.html
    [2]
    http://cloudera.github.io/cm_api/javadoc/4.6.0/com/cloudera/api/v1/ServicesResource.html#hdfsEnableHaCommand(java.lang.String,
    com.cloudera.api.model.ApiHdfsHaArguments)
    [3]
    http://cloudera.github.io/cm_api/javadoc/4.6.0/com/cloudera/api/model/ApiHdfsHaArguments.html

    On Thu, Jan 16, 2014 at 1:31 PM, Darren Lo wrote:

    Hi Swarnim,

    You will generally need to configure all required fields and then
    replicate every step you see in the "First Run" page of the wizard. For
    enabling HA, there are API commands that should have fairly descriptive
    documentation. Let us know if you hit problems.

    You can see a mostly-functional example here:
    https://github.com/eBay/hadrian

    Be sure to read my comments here, which I don't think have all been
    addressed yet:
    https://github.com/eBay/hadrian/issues/1

    In our own internal automated testing we use the API to set up clusters,
    and haven't had flakiness errors with failing to format the Namenode. You
    may have problems if you are retrying a fresh install without cleaning up
    all the data dirs (this cleanup can't be done via CM, which generally tries
    not to blow up your data).

    Thanks,
    Darren



    On Thu, Jan 16, 2014 at 11:24 AM, Swarnim Kulkarni <
    kulkarni.swarnim@gmail.com> wrote:
    Hello,

    I am trying to use the cloudera manager Java API to do an HA cluster
    deployment (HDFS and Jobtracker). I was wondering what are the prescriptive
    steps to do that? This is what I have tried without much success yet.

    1. Setup HDFS cluster with two namenodes, multiple datanodes and odd
    number of journalnodes.
    2. Define one of the NN as active and other as Standby node.
    3. Enable high availability.

    The process somehow is flaky and randomly fails with errors like "Failed
    to format Namenode". Is there a prescribed recipe to do this using the API?
    An example would be really awesome.

    Thanks,


    To unsubscribe from this group and stop receiving emails from it, send
    an email to scm-users+unsubscribe@cloudera.org.


    --
    Thanks,
    Darren


    --
    Swarnim


    --
    Thanks,
    Darren

    To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedJan 16, '14 at 7:24p
activeJan 17, '14 at 12:09a
posts4
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Darren Lo: 2 posts Kulkarni Swarnim: 2 posts

People

Translate

site design / logo © 2022 Grokbase