FAQ
Hi,
We plan a 100T cluster with M/R jobs running on 15G gzip files.
Should we configure HDFS block to be 128M or 256M.

Thanks,
Lior

Search Discussions

  • Harsh J at Mar 17, 2011 at 9:02 am
    15 G single Gzip files? Consider block sizes in 0.5 GB+. But it also
    depends on the processing slot-power you have. Higher blocks would
    lead to higher usage of processing capacity, although with higher load
    to the NameNode in maintaining lots of blocks (and replicas per) per
    file.
    On Thu, Mar 17, 2011 at 2:27 PM, Lior Schachter wrote:
    Hi,
    We plan a 100T cluster with M/R jobs running on 15G gzip files.
    Should we configure HDFS block to be 128M or 256M.

    Thanks,
    Lior


    --
    Harsh J
    http://harshj.com
  • Lior Schachter at Mar 17, 2011 at 9:22 am
    we have altogether 15G data to process every day (multiple M/R jobs running
    on the same set of data).
    currently we split this data to 60 files (but we can also split them to 120
    files).

    we have 15 machines with quad core.

    Thanks,
    Lior
    On Thu, Mar 17, 2011 at 11:01 AM, Harsh J wrote:

    15 G single Gzip files? Consider block sizes in 0.5 GB+. But it also
    depends on the processing slot-power you have. Higher blocks would
    lead to higher usage of processing capacity, although with higher load
    to the NameNode in maintaining lots of blocks (and replicas per) per
    file.
    On Thu, Mar 17, 2011 at 2:27 PM, Lior Schachter wrote:
    Hi,
    We plan a 100T cluster with M/R jobs running on 15G gzip files.
    Should we configure HDFS block to be 128M or 256M.

    Thanks,
    Lior


    --
    Harsh J
    http://harshj.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedMar 17, '11 at 8:58a
activeMar 17, '11 at 9:22a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Lior Schachter: 2 posts Harsh J: 1 post

People

Translate

site design / logo © 2022 Grokbase