FAQ
Hi,

I have a 10 node cluster (IBM blade servers, 48GB RAM, 2x500GB Disk, 16 HT
cores).

I've uploaded 10 files to HDFS. Each file is 10GB. I used the streaming jar
with 'wc -l' as mapper and 'cat' as reducer.

I use 64MB block size and the default replication (3).

The wc on the 100 GB took about 220 seconds which translates to about 3.5
Gbit/sec processing speed. One disk can do sequential read with 1Gbit/sec so
i would expect someting around 20 GBit/sec (minus some overhead), and I'm
getting only 3.5.

Is my expectaion valid?

I checked the jobtracked and it seems all nodes are working, each reading
the right blocks. I have not played with the number of mapper and reducers
yet. It seems the number of mappers is the same as the number of blocks and
the number of reducers is 20 (there are 20 disks). This looks ok for me.

Thank you,
Gyorgy

Search Discussions

  • Bharath Mundlapudi at May 16, 2011 at 7:56 am
    There are lots of other things you are not considering in this equation.

    Like, reducer output of each block is written 3 times, since replication is 3. What is the network setup? Is it 1Gbps link? Check if there are any issues with network also.

    Check if there are any straggler nodes. This can affect your map or reduce time. Of course, there are other beasts like tuning the cluster, queue time and scheduling.

    Try increasing the block size.

    -Bharath



    ________________________________
    From: György Balogh <bogyom74@gmail.com>
    To: common-user@hadoop.apache.org
    Sent: Monday, May 16, 2011 12:15 AM
    Subject: Poor performance on word count test on a 10 node cluster.

    Hi,

    I have a 10 node cluster (IBM blade servers, 48GB RAM, 2x500GB Disk, 16 HT
    cores).

    I've uploaded 10 files to HDFS. Each file is 10GB. I used the streaming jar
    with 'wc -l' as mapper and 'cat' as reducer.

    I use 64MB block size and the default replication (3).

    The wc on the 100 GB took about 220 seconds which translates to about 3.5
    Gbit/sec processing speed. One disk can do sequential read with 1Gbit/sec so
    i would expect someting around 20 GBit/sec (minus some overhead), and I'm
    getting only 3.5.

    Is my expectaion valid?

    I checked the jobtracked and it seems all nodes are working, each reading
    the right blocks. I have not played with the number of mapper and reducers
    yet. It seems the number of mappers is the same as the number of blocks and
    the number of reducers is 20 (there are 20 disks). This looks ok for me.

    Thank you,
    Gyorgy

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 16, '11 at 7:16a
activeMay 16, '11 at 7:56a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase