Scenario:
Hadoop version: 0.20.2
MR coding will be done in java.
Just starting out with my first Hadoop setup. I would like to know are
there any best practice ways to load data into the dfs? I have
(obviously) manually put data files into hdfs using the shell commands
while playing with it at setup but going forward I will want to be
retrieving large numbers of data feeds from remote, 3rd party locations
and throwing them into hadoop for analysis later. What is the best way
to automate this? Is it to gather the retrieved files into known
locations to be mounted and then automate via script etc. to put the
files into hdfs? Or is there some other practice? I've not been able to
find specific use case yet... all docs cover the basic fs command
without giving much details about more advanced setups.
thanks for any info
regards