|
Jason hadoop |
at Feb 3, 2009 at 4:24 am
|
⇧ |
| |
If you have a large number of ftp urls spread across many sites, simply set
that file to be your hadoop job input, and force the input split to be a
size that gives you good distribution across your cluster.
On Mon, Feb 2, 2009 at 3:23 PM, Steve Morin wrote:
Does any one have a good suggestion on how to submit a hadoop job that
will split the ftp retrieval of a number of files for insertion into
hdfs? I have been searching google for suggestions on this matter.
Steve