FAQ
We are trying to load data into hdfs from one of the slaves and when the put
command is run from a slave(datanode) all of the blocks are written to the
datanode's hdfs, and not distributed to all of the nodes in the cluster. It
does not seem to matter what destination format we use ( /filename vs
hdfs://master:9000/filename) it always behaves the same.
Conversely, running the same command from the namenode distributes the files
across the datanodes.

Is there something I am missing?

-Nathan

Search Discussions

  • Ken Goodhope at Jul 13, 2010 at 12:48 am
    All writes from a datanode leave one copy on the local node, one copy
    on another node in the same rack, and a third on another rack if
    available.
    On 7/12/10, Nathan Grice wrote:
    We are trying to load data into hdfs from one of the slaves and when the put
    command is run from a slave(datanode) all of the blocks are written to the
    datanode's hdfs, and not distributed to all of the nodes in the cluster. It
    does not seem to matter what destination format we use ( /filename vs
    hdfs://master:9000/filename) it always behaves the same.
    Conversely, running the same command from the namenode distributes the files
    across the datanodes.

    Is there something I am missing?

    -Nathan
  • Nathan Grice at Jul 13, 2010 at 3:09 pm
    We are trying to load data into hdfs from one of the slaves and when the put
    command is run from a slave(datanode) all of the blocks are written to the
    datanode's hdfs, and not distributed to all of the nodes in the cluster. It
    does not seem to matter what destination format we use ( /filename vs
    hdfs://master:9000/filename) it always behaves the same.
    Conversely, running the same command from the namenode distributes the files
    across the datanodes.

    Is there something I am missing?

    -Nathan
  • C.V.Krishnakumar at Jul 13, 2010 at 4:41 pm
    Hi,
    I am a newbie. I am curious to know how you discovered that all the blocks are written to datanode's hdfs? I thought the replication by namenode was transparent. Am I missing something?
    Thanks,
    Krishna
    On Jul 12, 2010, at 4:21 PM, Nathan Grice wrote:

    We are trying to load data into hdfs from one of the slaves and when the put
    command is run from a slave(datanode) all of the blocks are written to the
    datanode's hdfs, and not distributed to all of the nodes in the cluster. It
    does not seem to matter what destination format we use ( /filename vs
    hdfs://master:9000/filename) it always behaves the same.
    Conversely, running the same command from the namenode distributes the files
    across the datanodes.

    Is there something I am missing?

    -Nathan
  • Nathan Grice at Jul 13, 2010 at 4:46 pm
    To test the block distribution, run the same put command from the NameNode
    and then again from the DataNode.
    Check the HDFS filesystem after both commands. In my case, a 2GB file was
    distributed mostly evenly across the datanodes when put was run on the
    NameNode, and then put only on the DataNode where I ran the put command
    On Tue, Jul 13, 2010 at 9:32 AM, C.V.Krishnakumar wrote:

    Hi,
    I am a newbie. I am curious to know how you discovered that all the blocks
    are written to datanode's hdfs? I thought the replication by namenode was
    transparent. Am I missing something?
    Thanks,
    Krishna
    On Jul 12, 2010, at 4:21 PM, Nathan Grice wrote:

    We are trying to load data into hdfs from one of the slaves and when the put
    command is run from a slave(datanode) all of the blocks are written to the
    datanode's hdfs, and not distributed to all of the nodes in the cluster. It
    does not seem to matter what destination format we use ( /filename vs
    hdfs://master:9000/filename) it always behaves the same.
    Conversely, running the same command from the namenode distributes the files
    across the datanodes.

    Is there something I am missing?

    -Nathan
  • Allen Wittenauer at Jul 13, 2010 at 4:52 pm
    When you write on a machine running a datanode process, the data is *always* written locally first. This is to provide an optimization to the MapReduce framework. The lesson here is that you should *never* use a datanode machine to load your data. Always do it outside the grid.

    Additionally, you can use fsck (filename) -files -locations -blocks to see where those blocks have been written.
    On Jul 13, 2010, at 9:45 AM, Nathan Grice wrote:

    To test the block distribution, run the same put command from the NameNode
    and then again from the DataNode.
    Check the HDFS filesystem after both commands. In my case, a 2GB file was
    distributed mostly evenly across the datanodes when put was run on the
    NameNode, and then put only on the DataNode where I ran the put command
    On Tue, Jul 13, 2010 at 9:32 AM, C.V.Krishnakumar wrote:

    Hi,
    I am a newbie. I am curious to know how you discovered that all the blocks
    are written to datanode's hdfs? I thought the replication by namenode was
    transparent. Am I missing something?
    Thanks,
    Krishna
    On Jul 12, 2010, at 4:21 PM, Nathan Grice wrote:

    We are trying to load data into hdfs from one of the slaves and when the put
    command is run from a slave(datanode) all of the blocks are written to the
    datanode's hdfs, and not distributed to all of the nodes in the cluster. It
    does not seem to matter what destination format we use ( /filename vs
    hdfs://master:9000/filename) it always behaves the same.
    Conversely, running the same command from the namenode distributes the files
    across the datanodes.

    Is there something I am missing?

    -Nathan
  • C.V.Krishnakumar at Jul 13, 2010 at 5:23 pm
    Oh. Thanks for the reply.
    Regards,
    Krishna
    On Jul 13, 2010, at 9:51 AM, Allen Wittenauer wrote:


    When you write on a machine running a datanode process, the data is *always* written locally first. This is to provide an optimization to the MapReduce framework. The lesson here is that you should *never* use a datanode machine to load your data. Always do it outside the grid.

    Additionally, you can use fsck (filename) -files -locations -blocks to see where those blocks have been written.
    On Jul 13, 2010, at 9:45 AM, Nathan Grice wrote:

    To test the block distribution, run the same put command from the NameNode
    and then again from the DataNode.
    Check the HDFS filesystem after both commands. In my case, a 2GB file was
    distributed mostly evenly across the datanodes when put was run on the
    NameNode, and then put only on the DataNode where I ran the put command
    On Tue, Jul 13, 2010 at 9:32 AM, C.V.Krishnakumar wrote:

    Hi,
    I am a newbie. I am curious to know how you discovered that all the blocks
    are written to datanode's hdfs? I thought the replication by namenode was
    transparent. Am I missing something?
    Thanks,
    Krishna
    On Jul 12, 2010, at 4:21 PM, Nathan Grice wrote:

    We are trying to load data into hdfs from one of the slaves and when the put
    command is run from a slave(datanode) all of the blocks are written to the
    datanode's hdfs, and not distributed to all of the nodes in the cluster. It
    does not seem to matter what destination format we use ( /filename vs
    hdfs://master:9000/filename) it always behaves the same.
    Conversely, running the same command from the namenode distributes the files
    across the datanodes.

    Is there something I am missing?

    -Nathan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 12, '10 at 11:22p
activeJul 13, '10 at 5:23p
posts7
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase