FAQ
Hi,

I have a small doubt about the how HDFS manages the files internally.

Assume like I have a NameNode and 2 DataNodes. I have inserted a csv file of
size 80MB into HDFS using 'hadoop copyFromLocal' command.

Then how this file will be stored in HDFS?

Will it be split into two parts of size 64MB(Default chunk size) and
remaining 16Mb and copied to the 2 DataNodes?

If that is the case, if I am doing some map-reduce on the two dataNodes, as
the data is not line oriented I may get unexpected results.

How to solve this type of issues? Please help me.



Thanks & Regards
Shanmukhan.B

Search Discussions

  • Harsh J at Dec 29, 2010 at 5:12 am
    FileInputFormat takes care of line boundaries in splits, you don't
    need to worry about that.

    Each mapper works on a FileSplit, which contains the starting offset
    and the length from there. These things are computed for it with line
    boundaries in mind (and the extra bytes are pulled from the DataNode
    that has it).

    Similarly, in SequenceFiles, it is done using a special "Sync" byte
    embedded in between logical blocks of data.

    On Wed, Dec 29, 2010 at 10:27 AM, shanmukhan battinapati
    wrote:
    Hi,

    I have a small doubt about the how  HDFS manages the files internally.

    Assume like I have a NameNode and 2 DataNodes. I have inserted a csv file of
    size 80MB into HDFS using 'hadoop copyFromLocal' command.

    Then how this file will be stored in HDFS?

    Will it be split into two parts of size 64MB(Default chunk size) and
    remaining 16Mb and copied to the 2 DataNodes?

    If that is the case, if I am doing some map-reduce on the two dataNodes, as
    the data is not line oriented I may get unexpected results.

    How to solve this type of issues? Please help me.



    Thanks & Regards
    Shanmukhan.B


    --
    Harsh J
    www.harshj.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedDec 29, '10 at 4:58a
activeDec 29, '10 at 5:12a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Harsh J: 1 post Shanmukhan battinapati: 1 post

People

Translate

site design / logo © 2022 Grokbase