Hi,
I have to upload some terabytes of data that is text files.

What would be good option to do so:

i) using hadoop fs -put to copy text files directly on hdfs.

ii) copying text files as sequence files on hdfs ? What would be extra time
in this case as opposed to (i).

Thanks,
Jimmy

Search Discussions

  • Chase Bradford at Feb 17, 2011 at 2:33 am
    We use sequence files for storing text data, and you definitely notice the cost of compressing client side while streaming to hdfs. if I remember correctly, it took about 10x. That drove us to using writer treads that fed off a single input stream a few thousand lines at a time, and wrote to a hdfs directory with the desired name.
    On Feb 16, 2011, at 4:24 PM, Mapred Learn wrote:

    Hi,
    I have to upload some terabytes of data that is text files.

    What would be good option to do so:

    i) using hadoop fs -put to copy text files directly on hdfs.

    ii) copying text files as sequence files on hdfs ? What would be extra time in this case as opposed to (i).

    Thanks,
    Jimmy

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedFeb 17, '11 at 12:24a
activeFeb 17, '11 at 2:33a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Chase Bradford: 1 post Mapred Learn: 1 post

People

Translate

site design / logo © 2022 Grokbase