FAQ
hello folks,

I can see from the design doc of HDFS, says: client will buffer a block size
worth of data before contacting namenode for data node info. This is a
network throughput optimal way.
However, I could not find this buffer processing procedure in source code.

In DFSClient.DataStreamer, it waits for dataqueue to be not empty and starts
to request namenode and build a pipeline. The number of packets in the
dataqueue is always 1 when this happens!
I am confused here. Can anyone address this if I am wrong?

Search Discussions

  • Hairong Kuang at Aug 10, 2010 at 4:55 pm
    DataNode only buffers a packet before it contacts NameNode for allocating
    DataNodes to place the block. The doc you read might be too old.

    Hairong
    On 8/9/10 7:14 PM, "elton sky" wrote:

    hello folks,

    I can see from the design doc of HDFS, says: client will buffer a block size
    worth of data before contacting namenode for data node info. This is a network
    throughput optimal way.
    However, I could not find this buffer processing procedure in source code.

    In DFSClient.DataStreamer, it waits for dataqueue to be not empty and starts
    to request namenode and build a pipeline. The number of packets in the
    dataqueue is always 1 when this happens!
    I am confused here. Can anyone address this if I am wrong?
  • Elton sky at Aug 11, 2010 at 5:49 am
    thanks Hairong.

    And Then I wonder the reason of this design change.
    Buffer a block worth data before start transfer on client side will increase
    network throughput. But that also uses more memory. I think map reduce job,
    in most cases, is more memory-bound, like for shuffle phase, rather than
    network-bound. Is this the reason?
    On Wed, Aug 11, 2010 at 2:55 AM, Hairong Kuang wrote:

    DataNode only buffers a packet before it contacts NameNode for allocating
    DataNodes to place the block. The doc you read might be too old.

    Hairong


    On 8/9/10 7:14 PM, "elton sky" wrote:

    hello folks,

    I can see from the design doc of HDFS, says: client will buffer a block
    size worth of data before contacting namenode for data node info. This is a
    network throughput optimal way.
    However, I could not find this buffer processing procedure in source code.

    In DFSClient.DataStreamer, it waits for dataqueue to be not empty and
    starts to request namenode and build a pipeline. The number of packets in
    the dataqueue is always 1 when this happens!
    I am confused here. Can anyone address this if I am wrong?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedAug 10, '10 at 2:14a
activeAug 11, '10 at 5:49a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Elton sky: 2 posts Hairong Kuang: 1 post

People

Translate

site design / logo © 2022 Grokbase