I have have input data files in local filesystem: /input/path/*.log. But i
don't know nothing about their sizes, number of files etc. If *.log files
are small and there are lots of them it is no reason to start "bin/hadoop
fs -put" for each file, because one start of "bin/hadoop" is

1. What if i write "bin/hadoop fs -put /input/path/*.log
/hdfs/input/path"? Will "*" be passed to hadoop and hadoop will open all
files one-by-one or "*" will be processed by "/bin/bash"? If second
choise, then what if "expanded" command-line will me too long for bash
itself (more than 32768 symbols for ex? (if files too many))
2. I can merge many small files to "packs" (cat filename >> pack) by 100MB
per "pack" and then putting them into HDFS. But what if number of files
too many and finish size of all data is several GB? So i will need free
space on HDD = (input data size)*2 for that operations...

thank you

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupcommon-user @
postedJun 30, '09 at 10:22p
activeJul 1, '09 at 12:25a

2 users in discussion

Pavel kolodin: 1 post Jason hadoop: 1 post



site design / logo © 2022 Grokbase