FAQ
Hi, all

I'm confused by a question that "how does the HDFS decide where to put the
data blocks "

I mean that the user invokes some commands like "./hadoop put ***", we
assume that this file consistes of 3 blocks, but how HDFS decides where
these 3 blocks to be put?

Most of the materials don't involve this issue, but just introduce the data
replica where talking about blocks in HDFS,

can anyone give me some instructions?

Thanks

Nan

--
Nan Zhu
School of Software,5501
Shanghai Jiao Tong University
800,Dongchuan Road,Shanghai,China
E-Mail: zhunansjtu@gmail.com

Search Discussions

  • Kai Voigt at Apr 19, 2011 at 2:28 pm
    Hi,

    I found http://hadoopblog.blogspot.com/2009/09/hdfs-block-replica-placement-in-your.html explains the process nicely.

    The first replica of each block will be stored on the client machine, if it's a datanode itself. Makes sense, as it doesn't require a network transfer. Otherwise, a random datanode will be picked for the first replica.

    The second replica will be written to a random datanode on a random rack other than the rack where the first replica is stored. Here, HDFS's rack awareness will be utilized. So HDFS would survive a rack failure.

    The second replica will be written to the same rack as the second replica, but another random datanode in that rack. That will make the pipeline between second and third replica quick.

    Does that make sense to you? However, this is the current hard coded policy, there's ideas to make that policy customizable (https://issues.apache.org/jira/browse/HDFS-385).

    Kai

    Am 18.04.2011 um 15:46 schrieb Nan Zhu:
    Hi, all

    I'm confused by a question that "how does the HDFS decide where to put the
    data blocks "

    I mean that the user invokes some commands like "./hadoop put ***", we
    assume that this file consistes of 3 blocks, but how HDFS decides where
    these 3 blocks to be put?

    Most of the materials don't involve this issue, but just introduce the data
    replica where talking about blocks in HDFS,

    can anyone give me some instructions?

    Thanks

    Nan

    --
    Nan Zhu
    School of Software,5501
    Shanghai Jiao Tong University
    800,Dongchuan Road,Shanghai,China
    E-Mail: zhunansjtu@gmail.com
    --
    Kai Voigt
    k@123.org
  • Real great.. at Apr 19, 2011 at 2:46 pm
    In this context I would like to ask, can we actually place the data where we
    wish instead of allowing Hadoop's intelligence to take care of this?
    On Tue, Apr 19, 2011 at 10:52 AM, Kai Voigt wrote:

    Hi,

    I found
    http://hadoopblog.blogspot.com/2009/09/hdfs-block-replica-placement-in-your.htmlexplains the process nicely.

    The first replica of each block will be stored on the client machine, if
    it's a datanode itself. Makes sense, as it doesn't require a network
    transfer. Otherwise, a random datanode will be picked for the first replica.

    The second replica will be written to a random datanode on a random rack
    other than the rack where the first replica is stored. Here, HDFS's rack
    awareness will be utilized. So HDFS would survive a rack failure.

    The second replica will be written to the same rack as the second replica,
    but another random datanode in that rack. That will make the pipeline
    between second and third replica quick.

    Does that make sense to you? However, this is the current hard coded
    policy, there's ideas to make that policy customizable (
    https://issues.apache.org/jira/browse/HDFS-385).

    Kai

    Am 18.04.2011 um 15:46 schrieb Nan Zhu:
    Hi, all

    I'm confused by a question that "how does the HDFS decide where to put the
    data blocks "

    I mean that the user invokes some commands like "./hadoop put ***", we
    assume that this file consistes of 3 blocks, but how HDFS decides where
    these 3 blocks to be put?

    Most of the materials don't involve this issue, but just introduce the data
    replica where talking about blocks in HDFS,

    can anyone give me some instructions?

    Thanks

    Nan

    --
    Nan Zhu
    School of Software,5501
    Shanghai Jiao Tong University
    800,Dongchuan Road,Shanghai,China
    E-Mail: zhunansjtu@gmail.com
    --
    Kai Voigt
    k@123.org




    --
    Regards,
    R.V.
  • Harsh J at Apr 19, 2011 at 7:26 pm
    Hey Rahul,

    As Kai had pointed out, in 0.21 upwards the block placement policy has
    been made pluggable. You merely have to set the right custom class to
    use as the value of "dfs.block.replicator.classname" in your
    configuration. Your class should also implement the
    BlockPlacementPolicy interface.

    On Tue, Apr 19, 2011 at 11:10 AM, real great..
    wrote:
    In this context I would like to ask, can we actually place the data where we
    wish instead of allowing Hadoop's intelligence to take care of this?
    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 18, '11 at 1:46p
activeApr 19, '11 at 7:26p
posts4
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase