FAQ
Hello,

I am using FUSE-DFS with HDFS for a project. I have to modify the read and
write functions of fuse_dfs. I have few questions regarding the
fuse_dfs_read code. There is an rdbuffer_size variable associated with the
dfs_context, which is by default initialized to 10M. What is this
rdbuffer_size and what is it used for?

Secondly, in the fuse_dfs_read function, there are two places where
hdfsPread() is called in a loop. First, there is a check for whether the
requested read size is greater than the value of rdbuffer_size. Only if it
is, is the hdfsPread executed. In this case, the data is read into the
buffer passed from the caller.

In the second case, hdfsPread is executed for if a valid buffer is
associated with the dfs file handle fh and the size and offset of read
request lie within the range of the fh->buf. In this case, the data is read
into fh->buf.

Could someone explain what is happening here?

Thanks,
Aastha.

--
Aastha Mehta
B.E. (Hons.) Computer Science
BITS Pilani
E-mail: [email protected]

Search Discussions

  • Brian Bockelman at Sep 7, 2011 at 12:16 pm
    Hi Aastha,

    A read-ahead buffer is a common technique to trade higher bandwidth for lower latency for a number of common read patterns. Your OS does something similar (a much more advanced technique though). By reading ahead, HDFS is betting that your reads have a pattern to it. I think the 10MB default is a touch excessive (made more sense in previous releases). I use 32KB.

    The buffer is not used if you have very large reads, as it doesn't provide any benefit.

    Brian
    On Sep 7, 2011, at 12:45 AM, Aastha Mehta wrote:

    Hello,

    I am using FUSE-DFS with HDFS for a project. I have to modify the read and
    write functions of fuse_dfs. I have few questions regarding the
    fuse_dfs_read code. There is an rdbuffer_size variable associated with the
    dfs_context, which is by default initialized to 10M. What is this
    rdbuffer_size and what is it used for?

    Secondly, in the fuse_dfs_read function, there are two places where
    hdfsPread() is called in a loop. First, there is a check for whether the
    requested read size is greater than the value of rdbuffer_size. Only if it
    is, is the hdfsPread executed. In this case, the data is read into the
    buffer passed from the caller.

    In the second case, hdfsPread is executed for if a valid buffer is
    associated with the dfs file handle fh and the size and offset of read
    request lie within the range of the fh->buf. In this case, the data is read
    into fh->buf.

    Could someone explain what is happening here?

    Thanks,
    Aastha.

    --
    Aastha Mehta
    B.E. (Hons.) Computer Science
    BITS Pilani
    E-mail: [email protected]
  • Aastha Mehta at Sep 9, 2011 at 7:37 pm
    Thanks Brian. That helped.

    Regards,
    Aastha.
    On 7 September 2011 17:45, Brian Bockelman wrote:

    Hi Aastha,

    A read-ahead buffer is a common technique to trade higher bandwidth for
    lower latency for a number of common read patterns. Your OS does something
    similar (a much more advanced technique though). By reading ahead, HDFS is
    betting that your reads have a pattern to it. I think the 10MB default is a
    touch excessive (made more sense in previous releases). I use 32KB.

    The buffer is not used if you have very large reads, as it doesn't provide
    any benefit.

    Brian
    On Sep 7, 2011, at 12:45 AM, Aastha Mehta wrote:

    Hello,

    I am using FUSE-DFS with HDFS for a project. I have to modify the read and
    write functions of fuse_dfs. I have few questions regarding the
    fuse_dfs_read code. There is an rdbuffer_size variable associated with the
    dfs_context, which is by default initialized to 10M. What is this
    rdbuffer_size and what is it used for?

    Secondly, in the fuse_dfs_read function, there are two places where
    hdfsPread() is called in a loop. First, there is a check for whether the
    requested read size is greater than the value of rdbuffer_size. Only if it
    is, is the hdfsPread executed. In this case, the data is read into the
    buffer passed from the caller.

    In the second case, hdfsPread is executed for if a valid buffer is
    associated with the dfs file handle fh and the size and offset of read
    request lie within the range of the fh->buf. In this case, the data is read
    into fh->buf.

    Could someone explain what is happening here?

    Thanks,
    Aastha.

    --
    Aastha Mehta
    B.E. (Hons.) Computer Science
    BITS Pilani
    E-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-dev @
categorieshadoop
postedSep 7, '11 at 5:46a
activeSep 9, '11 at 7:37p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Aastha Mehta: 2 posts Brian Bockelman: 1 post

People

Translate

site design / logo © 2023 Grokbase