FAQ
Hi,

I have created hadoop cluster on single machine using different vm instances
.

Now will the replication factor be effective also I wanted to know about the
performance of the hdfs.


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Search Discussions

  • Oliver Fischer at Mar 26, 2009 at 1:23 am
    Hello Vishal,

    I did the same some weeks ago. The most important fact is, that it
    works. But it is horrible slow if you not have enough ram and multiple
    disks since all I/o-Operations go to the same disk.

    Best regards,

    Oliver

    Vishal Ghawate schrieb:
    Hi,

    I have created hadoop cluster on single machine using different vm instances
    .

    Now will the replication factor be effective also I wanted to know about the
    performance of the hdfs.


    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

    --
    Oliver B. Fischer, Schönhauser Allee 64, 10437 Berlin
    Tel. +49 30 44793251, Mobil: +49 178 7903538
    Mail: o.b.fischer@swe-blog.net Blog: http://www.swe-blog.net
  • Edward Capriolo at Mar 26, 2009 at 4:32 pm
    I use linux-vserver http://linux-vserver.org/

    The Linux-VServer technology is a soft partitioning concept based on
    Security Contexts which permits the creation of many independent
    Virtual Private Servers (VPS) that run simultaneously on a single
    physical server at full speed, efficiently sharing hardware resources.

    Usually whenever people talk about virtual machines, I always here
    about VMware, Xen, QEMU. For MY purposes Linux Vserver is far superior
    to all of them and its very helpful for the hadoop work I do. (I only
    want linux guests)

    No emulation overhead - I installed VMWare server on my laptop and was
    able to get 3 linux instances running before the system was unusable,
    the instances were not even doing anything.

    With VServer my system is not wasting cycles emulating devices. VMs
    are securely sharing a kernel and memory. You can effectively run many
    more VMs at once. This leaves the processor for user processes
    (hadoop) not emulation overhear.

    A minimal installation is 50 MB. I do not need a multi GB Linux
    install just to test a version of hadoop. This allows me to recklessly
    make VMs for whatever I want and not have to worry about GB chunks of
    my hard drive going with each VM.

    I can tar up a VM and use it as a template to install another VM. Thus
    I can deploy a new system in under 30 seconds. The HTTP RPM install
    takes about 2 minutes.

    The guest is chroot 'ed. I can easily copy files into the guest using
    copy commands. Think ant deploy -DTARGETDIR=/path/to/guest.
    But it is horrible slow if you not have enough ram and multiple
    disks since all I/o-Operations go to the same disk.
    VServer will not solve this problem, but at least you want be losing
    IO to 'emulation'.

    If you are working with hadoop and you need to be able to have
    multiple versions running, with different configurations, take a look
    at VServer.
  • Steve Loughran at Mar 30, 2009 at 11:33 am

    Oliver Fischer wrote:
    Hello Vishal,

    I did the same some weeks ago. The most important fact is, that it
    works. But it is horrible slow if you not have enough ram and multiple
    disks since all I/o-Operations go to the same disk.
    they may go to separate disks underneath, but performance is bad as what
    the virtual OS thinks is a raw hard disk could be a badly fragmented bit
    of storage on the container OS.

    Memory is another point of conflict; your VMs will swap out or block
    other vms.

    0. Keep different VM virtual disks on different physical disks. Fast
    disks at that.
    1. pre-allocate your virtual disks
    2. defragment at both the VM and host OS levels.
    3. Crank back the schedulers so that the VMs aren't competing too much
    for CPU time. One core for the host OS, one for each VM.
    4. You can keep an eye on performance by looking at the clocks of the
    various machines: if they pause and get jittery then they are being
    swapped out.

    Using multiple VMs on a single host is OK for testing, but not for hard
    work. You can use VM images to do work, but you need to have enough
    physical cores and RAM to match that of the VMs.

    -steve

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 25, '09 at 5:01a
activeMar 30, '09 at 11:33a
posts4
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase