FAQ
Hi,

in an MR step, I need to extract text from various files (using Tika). I
have put text extraction into reduce(), because I am writing the extracted
text to the output on HDFS. But now it occurs to me that I might as well
have put it into map() and have default reduce() which will write every
map() result out, is that true?

Thank you,
Mark

Search Discussions

  • Stuart White at Apr 21, 2009 at 3:46 am
    Unless you need the hashing/sorting provided by the reduce phase, I'd
    recommend placing your logic in your mapper and, when setting up your
    job, calling JobConf#setNumReduceTasks(0), so that the reduce phase
    won't be executed. In that case, any records emitted by your mapper
    will be written to the output.

    http://hadoop.apache.org/core/docs/r0.19.1/api/org/apache/hadoop/mapred/JobConf.html#setNumReduceTasks(int)

    On Mon, Apr 20, 2009 at 10:25 PM, Mark Kerzner wrote:
    Hi,

    in an MR step, I need to extract text from various files (using Tika). I
    have put text extraction into reduce(), because I am writing the extracted
    text to the output on HDFS. But now it occurs to me that I might as well
    have put it into map() and have default reduce() which will write every
    map() result out, is that true?

    Thank you,
    Mark

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 21, '09 at 3:25a
activeApr 21, '09 at 3:46a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Mark Kerzner: 1 post Stuart White: 1 post

People

Translate

site design / logo © 2021 Grokbase