FAQ
Hi All,

I have 2 nodes CentOS cluster (Master/Slave) running CM/hadoop/hbase as
below.

Both the nodes are running in VMs. If I take the backup of the VMs(*.vmdk
image files)
every 1 hr and in case of cluster failure, I can simply create 2 new VMs
using the backedup vmdk files
and recover the cluster( of course last 1hr data is gone).

Will the CM/hadoop/hbase work fine? Thoughts?

Thanks

Node 1 has:
Cloudera manager server + CM agent + postgres DB
Name Node
Data Node
Hbase Master/Regional Server.
Sqoop
Job/Task Tracker Node 2:
CM agent
Datanode
Regional Server

Search Discussions

  • Mike at Sep 24, 2012 at 6:40 pm
    Could you please respond?
    On Monday, September 24, 2012 10:17:08 AM UTC-4, Mike wrote:

    Hi All,

    I have 2 nodes CentOS cluster (Master/Slave) running CM/hadoop/hbase as
    below.

    Both the nodes are running in VMs. If I take the backup of the VMs(*.vmdk
    image files)
    every 1 hr and in case of cluster failure, I can simply create 2 new VMs
    using the backedup vmdk files
    and recover the cluster( of course last 1hr data is gone).

    Will the CM/hadoop/hbase work fine? Thoughts?

    Thanks

    Node 1 has:
    Cloudera manager server + CM agent + postgres DB
    Name Node
    Data Node
    Hbase Master/Regional Server.
    Sqoop
    Job/Task Tracker Node 2:
    CM agent
    Datanode
    Regional Server
  • Vikas Singh at Sep 24, 2012 at 7:03 pm
    How are you backing up these vmdk's? Are you taking backup of the RAM
    state of the VM or will that be lost? What you want is an ability to
    take snapshot/backup of the whole system while the system is in
    consistent state (some technologies that deal with this are
    Microsoft's Volume Shadow Copy and device mapper snapshot in Linux).

    Depending on how critical it is for your deployment, you may want to
    research into this. Last thing you want is that when you need your
    backup, it doesn't work (dropping RAM when taking vmdk backup most
    likely will cause filesystem issues). VMware does have solution that
    allows taking backups.

    - Vikas

    On Mon, Sep 24, 2012 at 11:40 AM, Mike wrote:
    Could you please respond?

    On Monday, September 24, 2012 10:17:08 AM UTC-4, Mike wrote:

    Hi All,

    I have 2 nodes CentOS cluster (Master/Slave) running CM/hadoop/hbase as
    below.

    Both the nodes are running in VMs. If I take the backup of the VMs(*.vmdk
    image files)
    every 1 hr and in case of cluster failure, I can simply create 2 new VMs
    using the backedup vmdk files
    and recover the cluster( of course last 1hr data is gone).

    Will the CM/hadoop/hbase work fine? Thoughts?

    Thanks

    Node 1 has:
    Cloudera manager server + CM agent + postgres DB
    Name Node
    Data Node
    Hbase Master/Regional Server.
    Sqoop
    Job/Task Tracker Node 2:
    CM agent
    Datanode
    Regional Server
  • Marcelo Vanzin at Sep 24, 2012 at 7:32 pm

    On Mon, Sep 24, 2012 at 12:03 PM, Vikas Singh wrote:
    Depending on how critical it is for your deployment, you may want to
    research into this. Last thing you want is that when you need your
    backup, it doesn't work (dropping RAM when taking vmdk backup most
    likely will cause filesystem issues). VMware does have solution that
    allows taking backups.
    BTW, what you want to look for is VMware's "quiesced snapshot"
    feature. It has sort of limited support for Linux guests (IIRC, with
    the latest release you have to write scripts to stop your applications
    so that their state is consistent in the snapshot), and it definitely
    does not support device mapper snapshots.

    If there is a command line tool for device mapper snapshots, you could
    hook that up with the script support above and do interesting things,
    though.

    --
    Marcelo
  • Philip Zeyliger at Sep 24, 2012 at 7:43 pm
    Keep in mind that, in many ways, Hadoop's fault-tolerance features imply
    that these shenanigans are less necessary.

    Assuming you're using a more typical number of machines (> 10), a typical
    individual failure won't affect Hadoop and you'll be able to simply replace
    the machine with a fresh one. Of course, your CM node, your JobTracker,
    your Hue server, etc. are things that are only running in one place; if
    machines hosting those services are down, it's best to have backups for the
    databases backing those, or using VM tricks.

    -- Philip
    On Mon, Sep 24, 2012 at 7:17 AM, Mike wrote:

    Hi All,

    I have 2 nodes CentOS cluster (Master/Slave) running CM/hadoop/hbase as
    below.

    Both the nodes are running in VMs. If I take the backup of the VMs(*.vmdk
    image files)
    every 1 hr and in case of cluster failure, I can simply create 2 new VMs
    using the backedup vmdk files
    and recover the cluster( of course last 1hr data is gone).

    Will the CM/hadoop/hbase work fine? Thoughts?

    Thanks

    Node 1 has:
    Cloudera manager server + CM agent + postgres DB
    Name Node
    Data Node
    Hbase Master/Regional Server.
    Sqoop
    Job/Task Tracker Node 2:
    CM agent
    Datanode
    Regional Server

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedSep 24, '12 at 2:44p
activeSep 24, '12 at 7:43p
posts5
users4
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase