FAQ
If a cluster has many datanodes and I want to copy a large file into DFS.
If the replication number is set to 1, does the namenode will put the file
data on one datanode or several nodes? I wonder if the file will be split
into blocks then different unique blocks are on different datanodes.

--
View this message in context: http://www.nabble.com/The-mechanism-of-choosing-target-datanodes-tp23193235p23193235.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Search Discussions

  • Alex Loddengaard at Apr 23, 2009 at 6:16 pm
    I believe the blocks will be distributed across data nodes and not local to
    only one data node. If this wasn't the case, then running a MR job on the
    file would only be local to one task tracker.

    Alex
    On Thu, Apr 23, 2009 at 2:14 AM, Xie, Tao wrote:


    If a cluster has many datanodes and I want to copy a large file into DFS.
    If the replication number is set to 1, does the namenode will put the file
    data on one datanode or several nodes? I wonder if the file will be split
    into blocks then different unique blocks are on different datanodes.

    --
    View this message in context:
    http://www.nabble.com/The-mechanism-of-choosing-target-datanodes-tp23193235p23193235.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.
  • Amr Awadallah at Apr 23, 2009 at 8:57 pm
    yes, it will be split across many nodes, and if possible each block will
    get a different datanode.

    see following link for more details:

    http://hadoop.apache.org/core/docs/current/hdfs_design.html#Data+Organization

    -- amr

    Alex Loddengaard wrote:
    I believe the blocks will be distributed across data nodes and not local to
    only one data node. If this wasn't the case, then running a MR job on the
    file would only be local to one task tracker.

    Alex

    On Thu, Apr 23, 2009 at 2:14 AM, Xie, Tao wrote:

    If a cluster has many datanodes and I want to copy a large file into DFS.
    If the replication number is set to 1, does the namenode will put the file
    data on one datanode or several nodes? I wonder if the file will be split
    into blocks then different unique blocks are on different datanodes.

    --
    View this message in context:
    http://www.nabble.com/The-mechanism-of-choosing-target-datanodes-tp23193235p23193235.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.

  • Jerome Banks at Apr 23, 2009 at 9:21 pm
    FYI, The pipe v2 results were created with com.quantcast.armor.jobs.pipev3.util.CountVG , inputing the results from com.quantcast.armor.jobs.pipev3.util.MyHarvestV2 (the mainline pipev2 harvest).
    The pipe v3 results were a one day run of BloomDaily for 04/12/2009.
    The CSV files were generated with TopNFlow.


    On 4/23/09 1:56 PM, "Amr Awadallah" wrote:

    yes, it will be split across many nodes, and if possible each block will
    get a different datanode.

    see following link for more details:

    http://hadoop.apache.org/core/docs/current/hdfs_design.html#Data+Organization

    -- amr

    Alex Loddengaard wrote:
    I believe the blocks will be distributed across data nodes and not local to
    only one data node. If this wasn't the case, then running a MR job on the
    file would only be local to one task tracker.

    Alex

    On Thu, Apr 23, 2009 at 2:14 AM, Xie, Tao wrote:

    If a cluster has many datanodes and I want to copy a large file into DFS.
    If the replication number is set to 1, does the namenode will put the file
    data on one datanode or several nodes? I wonder if the file will be split
    into blocks then different unique blocks are on different datanodes.

    --
    View this message in context:
    http://www.nabble.com/The-mechanism-of-choosing-target-datanodes-tp23193235p23193235.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.

  • Jason hadoop at Apr 24, 2009 at 1:32 am
    I haven't checked the code for any special cases of replication = 1.
    The write a block sequence is:

    1. Get a list of datanodes from the namenode for the block replicas, the
    reqest host being the first datanode returned if the request host is a
    datanode.
    2. send the block with the list of datanodes to receive it to the first
    datanode in the list
    3. That datanode sends the block to the next
    4. 3 repeats until the block is fully replicated.


    On Thu, Apr 23, 2009 at 2:08 PM, Jerome Banks wrote:

    FYI, The pipe v2 results were created with
    com.quantcast.armor.jobs.pipev3.util.CountVG , inputing the results from
    com.quantcast.armor.jobs.pipev3.util.MyHarvestV2 (the mainline pipev2
    harvest).
    The pipe v3 results were a one day run of BloomDaily for 04/12/2009.
    The CSV files were generated with TopNFlow.


    On 4/23/09 1:56 PM, "Amr Awadallah" wrote:

    yes, it will be split across many nodes, and if possible each block will
    get a different datanode.

    see following link for more details:


    http://hadoop.apache.org/core/docs/current/hdfs_design.html#Data+Organization

    -- amr

    Alex Loddengaard wrote:
    I believe the blocks will be distributed across data nodes and not local to
    only one data node. If this wasn't the case, then running a MR job on the
    file would only be local to one task tracker.

    Alex

    On Thu, Apr 23, 2009 at 2:14 AM, Xie, Tao wrote:

    If a cluster has many datanodes and I want to copy a large file into
    DFS.
    If the replication number is set to 1, does the namenode will put the
    file
    data on one datanode or several nodes? I wonder if the file will be
    split
    into blocks then different unique blocks are on different datanodes.

    --
    View this message in context:
    http://www.nabble.com/The-mechanism-of-choosing-target-datanodes-tp23193235p23193235.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.


    --
    Alpha Chapters of my book on Hadoop are available
    http://www.apress.com/book/view/9781430219422

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 23, '09 at 9:15a
activeApr 24, '09 at 1:32a
posts5
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase