FAQ
Hi everyone,

We are going to create a new Hadoop cluster in our company, i have to get
some advises from you:

1. Does anyone have stored whole Hadoop data not on local disks but
on Netapp or other storage system? Do we have to store datas on local disks,
if so is it because of performace issues?

2. What do you think about running Hadoop nodes in virtual (VMware)
servers?

Thanks...

Search Discussions

  • Kai Voigt at Aug 25, 2011 at 7:02 am
    Hi,

    Am 25.08.2011 um 08:58 schrieb Hakan İlter:
    We are going to create a new Hadoop cluster in our company, i have to get
    some advises from you:

    1. Does anyone have stored whole Hadoop data not on local disks but
    on Netapp or other storage system? Do we have to store datas on local disks,
    if so is it because of performace issues?
    HDFS and MapReduce benefit massively from local storage, so using any kind of remote storage (SAN, Amazon S3, etc) will make things slower.
    2. What do you think about running Hadoop nodes in virtual (VMware)
    servers?

    Virtualization can make certain things easier to handle, but it's a layer that will eat resources.

    Kai

    --
    Kai Voigt
    k@123.org
  • Sagar Shukla at Aug 25, 2011 at 7:21 am
    Hi Hakan,

    Please find my comments inline in blue :



    -----Original Message-----
    From: Hakan (c)lter
    Sent: Thursday, August 25, 2011 12:28 PM
    To: common-user@hadoop.apache.org
    Subject: Hadoop with Netapp



    Hi everyone,



    We are going to create a new Hadoop cluster in our company, i have to get some advises from you:



    1. Does anyone have stored whole Hadoop data not on local disks but on Netapp or other storage system? Do we have to store datas on local disks, if so is it because of performace issues?



    <sagar>: Yes, we were using SAN LUNs for storing Hadoop data. SAN works faster than NAS in terms of performance while writing the data to the storage. Also SAN LUNs can be auto-mounted while booting up the system.



    2. What do you think about running Hadoop nodes in virtual (VMware) servers?



    <sagar>: If high speed computing is not a requirement for you then Hadoop nodes in VM environment could be a good option, but one other slight drawback is when the VM crashes recovery of the in-memory data would be gone. Hadoop takes care of some amount of failover, but there is some amount of risk involved and requires good HA building capabilities.



    Thanks,

    Sagar



    Thanks...

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
  • Steve Loughran at Sep 1, 2011 at 9:49 am

    On 25/08/11 08:20, Sagar Shukla wrote:
    Hi Hakan,

    Please find my comments inline in blue :



    -----Original Message-----
    From: Hakan (c)lter
    Sent: Thursday, August 25, 2011 12:28 PM
    To: common-user@hadoop.apache.org
    Subject: Hadoop with Netapp



    Hi everyone,



    We are going to create a new Hadoop cluster in our company, i have to get some advises from you:



    1. Does anyone have stored whole Hadoop data not on local disks but on Netapp or other storage system? Do we have to store datas on local disks, if so is it because of performace issues?



    <sagar>: Yes, we were using SAN LUNs for storing Hadoop data. SAN works faster than NAS in terms of performance while writing the data to the storage. Also SAN LUNs can be auto-mounted while booting up the system.
    Silly question: why? SANs are SPOFs (Gray & van Ingen, MS, 2005; SAN
    responsible for 11% of terraserver downtime).

    Was it because you had the rack and wanted to run Hadoop, or did you
    want a more agile cluster? Because it's going to increase your cost of
    storage dramatically, which means you pay more per TB, or end up with
    less TB of storage. I wouldn't go this way for a dedicated Hadoop
    cluster. For a multi-use cluster, it's a different story



    2. What do you think about running Hadoop nodes in virtual (VMware) servers?



    <sagar>: If high speed computing is not a requirement for you then Hadoop nodes in VM environment could be a good option, but one other slight drawback is when the VM crashes recovery of the in-memory data would be gone. Hadoop takes care of some amount of failover, but there is some amount of risk involved and requires good HA building capabilities.

    I do it for dev and test work, and for isolated clusters in a shared
    environment.

    -for CPU bound stuff, it actually works quite well, as there's no
    significant overhead

    -for HDD access, reading from the FS, writing to the FS and to store
    transient spill data you take a tangible performance hit. That's OK if
    you can afford to wait or rent a few extra CPUs -and your block size is
    such that those extra servers can help out -which may be in the map
    phase more than the reduce phase


    Some Hadoop-ish projects -Stratosphere from TuB in particular- are
    designed for VM infrastructure so come up with execution plans to use
    VMs efficiently.

    -steve

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 25, '11 at 6:58a
activeSep 1, '11 at 9:49a
posts4
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase