FAQ
Hi All,

I have 2 nodes Centos cluster(Master/Slave) running CM/hadoop/hbase and
used the default config came with the CM4.0/CDH4.

Node 1 has:
Cloudera manager server + CM agent + postgres DB
Name Node
Data Node
Hbase Master/Regional Server.
Sqoop
Job/Task Tracker Node 2:
CM agent
Datanode
Regional Server
Task Tracker
I need to take the backup for this setup.

What is the best way please?

Any thoughts?

Thanks

Search Discussions

  • Adam Smieszny at Sep 20, 2012 at 7:02 pm
    For HDFS you would want to back up the metadata (copy off everything from
    dfs.name.dir to a separate location)
    For HBase, http://hbase.apache.org/book/ops.backup.html
    For CM, you would also want to back up the Postgres DB that stores the CM
    data http://www.postgresql.org/docs/8.1/static/backup.html

    Hope this helps
    Thanks,
    Adam
    On Thu, Sep 20, 2012 at 12:16 PM, Mike wrote:

    Hi All,

    I have 2 nodes Centos cluster(Master/Slave) running CM/hadoop/hbase and
    used the default config came with the CM4.0/CDH4.

    Node 1 has:
    Cloudera manager server + CM agent + postgres DB
    Name Node
    Data Node
    Hbase Master/Regional Server.
    Sqoop
    Job/Task Tracker Node 2:
    CM agent
    Datanode
    Regional Server
    Task Tracker
    I need to take the backup for this setup.

    What is the best way please?

    Any thoughts?

    Thanks


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://tiny.cloudera.com/about
    917.830.4156 | http://www.linkedin.com/in/adamsmieszny
  • Mike at Sep 21, 2012 at 6:42 pm
    So, in case of cluster crash, reinstall CM4.0 and CDH4 and copy
    backed up dfs.name.dir to new dfs.name.dir
    import the hbase table
    import postgres data and
    start the new cluster right? Will that work?

    Thanks
    On Thursday, September 20, 2012 3:02:49 PM UTC-4, Adam Smieszny wrote:

    For HDFS you would want to back up the metadata (copy off everything from
    dfs.name.dir to a separate location)
    For HBase, http://hbase.apache.org/book/ops.backup.html
    For CM, you would also want to back up the Postgres DB that stores the CM
    data http://www.postgresql.org/docs/8.1/static/backup.html

    Hope this helps
    Thanks,
    Adam

    On Thu, Sep 20, 2012 at 12:16 PM, Mike <mike...@gmail.com <javascript:>>wrote:
    Hi All,

    I have 2 nodes Centos cluster(Master/Slave) running CM/hadoop/hbase and
    used the default config came with the CM4.0/CDH4.

    Node 1 has:
    Cloudera manager server + CM agent + postgres DB
    Name Node
    Data Node
    Hbase Master/Regional Server.
    Sqoop
    Job/Task Tracker Node 2:
    CM agent
    Datanode
    Regional Server
    Task Tracker
    I need to take the backup for this setup.

    What is the best way please?

    Any thoughts?

    Thanks


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://tiny.cloudera.com/about
    917.830.4156 | http://www.linkedin.com/in/adamsmieszny
  • Mike at Nov 19, 2012 at 5:56 pm
    Could you please respond?
    On Friday, September 21, 2012 2:35:16 PM UTC-4, Mike wrote:

    So, in case of cluster crash, reinstall CM4.0 and CDH4 and copy
    backed up dfs.name.dir to new dfs.name.dir
    import the hbase table
    import postgres data and
    start the new cluster right? Will that work?

    Thanks
    On Thursday, September 20, 2012 3:02:49 PM UTC-4, Adam Smieszny wrote:

    For HDFS you would want to back up the metadata (copy off everything from
    dfs.name.dir to a separate location)
    For HBase, http://hbase.apache.org/book/ops.backup.html
    For CM, you would also want to back up the Postgres DB that stores the CM
    data http://www.postgresql.org/docs/8.1/static/backup.html

    Hope this helps
    Thanks,
    Adam
    On Thu, Sep 20, 2012 at 12:16 PM, Mike wrote:

    Hi All,

    I have 2 nodes Centos cluster(Master/Slave) running CM/hadoop/hbase and
    used the default config came with the CM4.0/CDH4.

    Node 1 has:
    Cloudera manager server + CM agent + postgres DB
    Name Node
    Data Node
    Hbase Master/Regional Server.
    Sqoop
    Job/Task Tracker Node 2:
    CM agent
    Datanode
    Regional Server
    Task Tracker
    I need to take the backup for this setup.

    What is the best way please?

    Any thoughts?

    Thanks


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://tiny.cloudera.com/about
    917.830.4156 | http://www.linkedin.com/in/adamsmieszny
  • Adam Smieszny at Nov 19, 2012 at 10:38 pm
    Hi Mike,

    It really depends on the exact failure vector you are protecting against.

    For instance:
    1) If your namenode crashes but your datanodes are fine, then you could
    restore the dfs.name.dir contents and you should be ok.
    2) If your entire hbase cluster went down, and you needed to restore the
    data, you could import the data into hbase as per the link provided
    3) If your Cloudera Manager server went down, and you had to bring up a new
    one, you could do so and retain the history using the postgres data as
    described.

    To your questions, if your 2 node cluster were blown away and you wanted to
    get back up to where you were, you would have to architect something
    robust, encompassing these pieces but more as well (likely an entire data
    dump somewhere given that it's only 2 nodes, but that may change as you
    scale).

    Thanks,
    Adam

    On Mon, Nov 19, 2012 at 12:56 PM, Mike wrote:

    Could you please respond?

    On Friday, September 21, 2012 2:35:16 PM UTC-4, Mike wrote:

    So, in case of cluster crash, reinstall CM4.0 and CDH4 and copy
    backed up dfs.name.dir to new dfs.name.dir
    import the hbase table
    import postgres data and
    start the new cluster right? Will that work?

    Thanks
    On Thursday, September 20, 2012 3:02:49 PM UTC-4, Adam Smieszny wrote:

    For HDFS you would want to back up the metadata (copy off everything
    from dfs.name.dir to a separate location)
    For HBase, http://hbase.apache.**org/book/ops.backup.html<http://hbase.apache.org/book/ops.backup.html>
    For CM, you would also want to back up the Postgres DB that stores the
    CM data http://www.postgresql.**org/docs/8.1/static/backup.**html<http://www.postgresql.org/docs/8.1/static/backup.html>

    Hope this helps
    Thanks,
    Adam
    On Thu, Sep 20, 2012 at 12:16 PM, Mike wrote:

    Hi All,

    I have 2 nodes Centos cluster(Master/Slave) running CM/hadoop/hbase and
    used the default config came with the CM4.0/CDH4.

    Node 1 has:
    Cloudera manager server + CM agent + postgres DB
    Name Node
    Data Node
    Hbase Master/Regional Server.
    Sqoop
    Job/Task Tracker Node 2:
    CM agent
    Datanode
    Regional Server
    Task Tracker
    I need to take the backup for this setup.

    What is the best way please?

    Any thoughts?

    Thanks


    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://tiny.cloudera.com/**about<http://tiny.cloudera.com/about>
    917.830.4156 | http://www.linkedin.com/in/**adamsmieszny<http://www.linkedin.com/in/adamsmieszny>

    --
    Adam Smieszny
    Cloudera | Systems Engineer | http://tiny.cloudera.com/about
    917.830.4156 | http://www.linkedin.com/in/adamsmieszny

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedSep 20, '12 at 4:16p
activeNov 19, '12 at 10:38p
posts5
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Mike: 3 posts Adam Smieszny: 2 posts

People

Translate

site design / logo © 2022 Grokbase