|| at Feb 17, 2011 at 2:33 am
We use sequence files for storing text data, and you definitely notice the cost of compressing client side while streaming to hdfs. if I remember correctly, it took about 10x. That drove us to using writer treads that fed off a single input stream a few thousand lines at a time, and wrote to a hdfs directory with the desired name.
On Feb 16, 2011, at 4:24 PM, Mapred Learn wrote:
I have to upload some terabytes of data that is text files.
What would be good option to do so:
i) using hadoop fs -put to copy text files directly on hdfs.
ii) copying text files as sequence files on hdfs ? What would be extra time in this case as opposed to (i).