FAQ
Hello,



I know that in hadoop-site.xml, dfs.blocksize sets the size of the HDFS
block. However, I've noticed that this does not necessarily correspond
to the blocksize as data is written to disk in dfs.data.dir.



Is there another setting where we can modify the blocksize written to
disk?



Thanks for the help.



Best regards,



Danny



Danny B. Gross

Solutions Engineering

Spansion, Inc.

Search Discussions

  • Todd Lipcon at Jul 16, 2009 at 5:54 pm
    Hi Danny,

    I'm not an expert in this code path, but looking at the code, it looks to me
    like the writes into the blocks in dfs.data.dir aren't buffered in any way.
    The writes as they're received over the network protocol by BlockReceiver
    are simply passed through to the underlying filesystem. So, the DFSClient is
    the thing that determines the size of blocks actually written.

    Looking at DFSClient, it appears to me that it does writes based on the
    undocumented configuration variable dfs.write.packet.size, which defaults to
    64KB. In that packet, it has to encode some lengths, plus checksums, plus
    the data itself. Given that the checksums are CRC32, we have 4 bytes
    overhead for every 512 (by default) bytes of data, so we should get about
    63KB of actual data into each packet. So, I'd figure that by default
    settings the writes going to the underlying filesystem will be that size.

    These are configuration parameters that are rarely tweaked and I wouldn't
    suggest changing them. The size of writes to the underlying FS is generally
    inconsequential compared to other overheads in HDFS.

    Hope that helps
    -Todd
    On Thu, Jul 16, 2009 at 8:05 AM, Gross, Danny wrote:

    Hello,



    I know that in hadoop-site.xml, dfs.blocksize sets the size of the HDFS
    block. However, I've noticed that this does not necessarily correspond
    to the blocksize as data is written to disk in dfs.data.dir.



    Is there another setting where we can modify the blocksize written to
    disk?



    Thanks for the help.



    Best regards,



    Danny



    Danny B. Gross

    Solutions Engineering

    Spansion, Inc.


  • Gross, Danny at Jul 16, 2009 at 5:55 pm
    Todd,

    I really appreciate this info! Thanks a bunch for responding so
    quickly!

    Best regards,

    Danny

    -----Original Message-----
    From: Todd Lipcon
    Sent: Thursday, July 16, 2009 12:54 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Is there a setting in hadoop-site.xml that affects
    blocksize written to disk?

    Hi Danny,

    I'm not an expert in this code path, but looking at the code, it looks
    to me
    like the writes into the blocks in dfs.data.dir aren't buffered in any
    way.
    The writes as they're received over the network protocol by
    BlockReceiver
    are simply passed through to the underlying filesystem. So, the
    DFSClient is
    the thing that determines the size of blocks actually written.

    Looking at DFSClient, it appears to me that it does writes based on the
    undocumented configuration variable dfs.write.packet.size, which
    defaults to
    64KB. In that packet, it has to encode some lengths, plus checksums,
    plus
    the data itself. Given that the checksums are CRC32, we have 4 bytes
    overhead for every 512 (by default) bytes of data, so we should get
    about
    63KB of actual data into each packet. So, I'd figure that by default
    settings the writes going to the underlying filesystem will be that
    size.

    These are configuration parameters that are rarely tweaked and I
    wouldn't
    suggest changing them. The size of writes to the underlying FS is
    generally
    inconsequential compared to other overheads in HDFS.

    Hope that helps
    -Todd

    On Thu, Jul 16, 2009 at 8:05 AM, Gross, Danny
    wrote:
    Hello,



    I know that in hadoop-site.xml, dfs.blocksize sets the size of the HDFS
    block. However, I've noticed that this does not necessarily
    correspond
    to the blocksize as data is written to disk in dfs.data.dir.



    Is there another setting where we can modify the blocksize written to
    disk?



    Thanks for the help.



    Best regards,



    Danny



    Danny B. Gross

    Solutions Engineering

    Spansion, Inc.


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 16, '09 at 3:24p
activeJul 16, '09 at 5:55p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Gross, Danny: 2 posts Todd Lipcon: 1 post

People

Translate

site design / logo © 2022 Grokbase