FAQ
Does any one have a good suggestion on how to submit a hadoop job that
will split the ftp retrieval of a number of files for insertion into
hdfs? I have been searching google for suggestions on this matter.
Steve

Search Discussions

  • Jason hadoop at Feb 3, 2009 at 4:24 am
    If you have a large number of ftp urls spread across many sites, simply set
    that file to be your hadoop job input, and force the input split to be a
    size that gives you good distribution across your cluster.

    On Mon, Feb 2, 2009 at 3:23 PM, Steve Morin wrote:

    Does any one have a good suggestion on how to submit a hadoop job that
    will split the ftp retrieval of a number of files for insertion into
    hdfs? I have been searching google for suggestions on this matter.
    Steve
  • Tom White at Feb 3, 2009 at 9:44 am
    NLineInputFormat is ideal for this purpose. Each split will be N lines
    of input (where N is configurable), so each mapper can retrieve N
    files for insertion into HDFS. You can set the number of redcers to
    zero.

    Tom
    On Tue, Feb 3, 2009 at 4:23 AM, jason hadoop wrote:
    If you have a large number of ftp urls spread across many sites, simply set
    that file to be your hadoop job input, and force the input split to be a
    size that gives you good distribution across your cluster.

    On Mon, Feb 2, 2009 at 3:23 PM, Steve Morin wrote:

    Does any one have a good suggestion on how to submit a hadoop job that
    will split the ftp retrieval of a number of files for insertion into
    hdfs? I have been searching google for suggestions on this matter.
    Steve

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 2, '09 at 11:23p
activeFeb 3, '09 at 9:44a
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase