FAQ
I installed hadoop-0.20.2 in Eucalyptus VM environment. The file system
is based on glusterfs, so it is a shared NAS. Though the nodes are much
powerful (8 cores + 15G memory), I found the response of hadoop namenode
and data nodes became very slow. For example, after running
start-all.sh, the datanodes take more than 5 minutes to be ready. The
safe mode time is really really long. Moreover, the program also runs
much slower than it did on old physical cluster nodes. I have tried
running hadoop on a cluster containing 15 VM nodes, also on a pesudo
cluster on a single VM, all very slow. Is it because NAS is an IO
bottleneck? The HDFS is created on top of glusterfs like reinventing
the wheel, so I tried to adjust the replication setting to different
values (1 to 4) but no improvement. I haven't tried CDH3 package yet. I
wonder whether switching to CDH3 would bring any significant
improvement. Any suggestion about this issue is highly appreciated.

Shi

Search Discussions

  • Sridhar basam at Aug 29, 2011 at 4:18 pm

    On Mon, Aug 29, 2011 at 11:32 AM, Shi Yu wrote:

    I installed hadoop-0.20.2 in Eucalyptus VM environment. The file system is
    based on glusterfs, so it is a shared NAS. Though the nodes are much
    powerful (8 cores + 15G memory), I found the response of hadoop namenode and
    data nodes became very slow. For example, after running start-all.sh, the
    datanodes take more than 5 minutes to be ready. The safe mode time is
    really really long. Moreover, the program also runs much slower than it did
    on old physical cluster nodes. I have tried running hadoop on a cluster
    containing 15 VM nodes, also on a pesudo cluster on a single VM, all very
    slow. Is it because NAS is an IO bottleneck? The HDFS is created on top of
    glusterfs like reinventing the wheel, so I tried to adjust the replication
    setting to different values (1 to 4) but no improvement. I haven't tried
    CDH3 package yet. I wonder whether switching to CDH3 would bring any
    significant improvement. Any suggestion about this issue is highly
    appreciated.

    Shi
    Your problems are likely due to your setup (VMs and your NAS filesystem).
    Without additional information it would be hard to say where the problem is
    but installing CDH3 isn't going to fix your performance issues. It is based
    on the apache distribution along with a few additional patches.

    You are better off running hadoop on physical hardware with local storage.
    If you want to narrow down the problem, start with one change at a time.
    Looks like you already had/have a cluster on physical hardware. Bring up a
    cluster on just VM hardware without Gluster. Time whatever benchmark you are
    using, then introduce another change and repeat process.

    Sridhar
  • Steve Loughran at Sep 1, 2011 at 9:51 am

    On 29/08/11 16:32, Shi Yu wrote:
    I installed hadoop-0.20.2 in Eucalyptus VM environment. The file system
    is based on glusterfs, so it is a shared NAS. Though the nodes are much
    powerful (8 cores + 15G memory), I found the response of hadoop namenode
    and data nodes became very slow. For example, after running
    start-all.sh, the datanodes take more than 5 minutes to be ready. The
    safe mode time is really really long. Moreover, the program also runs
    much slower than it did on old physical cluster nodes. I have tried
    running hadoop on a cluster containing 15 VM nodes, also on a pesudo
    cluster on a single VM, all very slow. Is it because NAS is an IO
    bottleneck? The HDFS is created on top of glusterfs like reinventing the
    wheel, so I tried to adjust the replication setting to different values
    (1 to 4) but no improvement. I haven't tried CDH3 package yet.
    Why use hdfs at all? If it's a shared fs, use file:// URLs


    I wonder
    whether switching to CDH3 would bring any significant improvement.
    It won't

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 29, '11 at 3:34p
activeSep 1, '11 at 9:51a
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase