FAQ
Hi all,

When a datanode receive a block, the datanode
write the block into 2 streams on disk:
- the data stream (dataOut)
- the checksum stream (checksumOut)

While the checksumOut is created with following code:
this.checksumOut = new DataOutputStream(new BufferedOutputStream(
streams.checksumOut,
SMALL_BUFFER_SIZE));
The dataOut is simply FileOutputStream()

So, the checksumOut is buffered, but dataOut is not.

Is there any particular reason for doing so?
or it doesn't matter, because after that, we flush
the two streams anyway?

Thanks
Thanh

Search Discussions

  • Eli Collins at Nov 6, 2010 at 3:36 am
    Hey Thanh,

    Data gets written in 64KB packets so there doesn't seem to be a need
    to buffer it.

    Thanks,
    Eli
    On Thu, Nov 4, 2010 at 2:58 PM, Thanh Do wrote:
    Hi all,

    When a datanode receive a block, the datanode
    write the block into 2 streams on disk:
    - the data stream (dataOut)
    - the checksum stream (checksumOut)

    While the checksumOut is created with following code:
    this.checksumOut = new DataOutputStream(new BufferedOutputStream(
    streams.checksumOut,
    SMALL_BUFFER_SIZE));
    The dataOut is simply FileOutputStream()

    So, the checksumOut is buffered, but dataOut is not.

    Is there any particular reason for doing so?
    or it doesn't matter, because after that, we flush
    the two streams anyway?

    Thanks
    Thanh
  • Thanh Do at Nov 6, 2010 at 3:15 pm
    Thanks Eli,

    I got it now.
    On Fri, Nov 5, 2010 at 10:36 PM, Eli Collins wrote:

    Hey Thanh,

    Data gets written in 64KB packets so there doesn't seem to be a need
    to buffer it.

    Thanks,
    Eli
    On Thu, Nov 4, 2010 at 2:58 PM, Thanh Do wrote:
    Hi all,

    When a datanode receive a block, the datanode
    write the block into 2 streams on disk:
    - the data stream (dataOut)
    - the checksum stream (checksumOut)

    While the checksumOut is created with following code:
    this.checksumOut = new DataOutputStream(new BufferedOutputStream(
    streams.checksumOut,
    SMALL_BUFFER_SIZE));
    The dataOut is simply FileOutputStream()

    So, the checksumOut is buffered, but dataOut is not.

    Is there any particular reason for doing so?
    or it doesn't matter, because after that, we flush
    the two streams anyway?

    Thanks
    Thanh

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-dev @
categorieshadoop
postedNov 4, '10 at 9:58p
activeNov 6, '10 at 3:15p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Thanh Do: 2 posts Eli Collins: 1 post

People

Translate

site design / logo © 2023 Grokbase